Index and non-key columns in Postgresql
What is a Sequential scan?
What is Index Scan?
What is Index Only Scan and what are non-key columns?
For the demonstration, Please find the table and data query below(it will generate the table and data):
CREATE TABLE song(
id serial ,
name varchar(255),
singer varchar(124),
releaseyear integer
);
INSERT INTO song (name, singer, releaseyear)
SELECT 'Songname' || id, 'Singer' || id, floor(random()*(2023-1990+1))+1990 FROM generate_series(1, 5000) as id
- What is a Sequential scan?
Execute the query in your Pgadmin or terminal whatever you prefer.
EXPLAIN ANALYZE SELECT name FROM song WHERE id = 101;
EXPLAIN ANALYZE returns the information regarding how PostgreSQL
planned and executed the query.
so you will see Something like Seq Scan on song and some other information. this means Postgresql had to go through all the pages (Postgresql stores data in pages), decode every page, and whichever row had 101 id had to show to the user, and that's what the query planner told us that it did the sequential scan for all the rows.
- What is Index Scan?
To understand the index scan we are going to create an index on id column and you can find the query for that as well:
CREATE INDEX song_id_index ON song(id);
Now Execute the below query again
EXPLAIN ANALYZE SELECT name FROM song WHERE id = 101;
look at the output
Now Query Planner told us that it performed the index scan using song_id_index which means Postgresql did not go to the actual data directly but it hit the index first and found out where the id 101 data is present (index returns a pageid and offset for the row), now Postgresql just has to go to the pageid and decode page and extract the only row. which is far better than sequential scan where Postgresql checks every single page record decoding it and checking.
Postgresql still needs to fetch the specified pageid and extract information.
- What is Index Only Scan and what are non-key columns?
Index Only Scan essentially means we don't want to hit the actual heap data instead we will store the frequent access data in the index itself.
So in our case, We will create an index on the id column and will also store the song name information into the index itself. and we can store that information through non-key columns.
DROP INDEX song_id_index;
CREATE INDEX song_id_index ON song(id) INCLUDE (name);
Now Execute the query again
EXPLAIN ANALYZE SELECT name FROM song WHERE id = 101;
We will see that query Planner now is not going to the actual data (heap memory where our data is stored), we can also see it says heap fetches: 0 and it performs Index Only Scan to execute the query.
By not Fetching from the Heap memory we are potentially neglecting the I\O operation which results in faster query execution.
But you also need to understand that as we add more non-key columns it also increases the size of the index.
To Summarize it:
Query Planner | Index Fetch | Heap Fetch |
Sequential Scan | NO | YES(For all the Pages) |
Index Scan | YES | YES(Not all the Pages) |
Index Only Scan | YES | NO |
And now you have the idea of Sequential, Index, and Index Only Scan works in Postgresql.