This article is half-done without your Comment! *** Please share your thoughts via Comment ***
If a table has a 5000 or 500000 or 5000000000 records and the requirement is to find the total row count of the table, most of the Database Developer always executes COUNT(*) for getting the row count.
I found that many of our team members also execute COUNT(*), but just imagine that table has 5000000000 rows and COUNT(*) is taking a long time for getting the number of records.
The counting of the rows in such a big table always creates the performance issue and its also required I/O operation.
If you need exact row count for a given time, COUNT(*) is mandatory.
But you can speed up this dramatically if the count does not have to be exact.
You can use a metadata table or statistical table to find the row count, which is quite same as real row count.
Because the nature of MVCC, sometimes you can find the difference between actual record count and statistical table’s record count.
You can easily find roughly row count using statistical tables within 1 second.
I tested and compared the two results in my local machine with the row count of 5000000000.
My COUNT (*) returns a result after 8 to 10 minutes and also taken 10% to 25% CPU and Memory.
After this, I found row count from the statistical table, and it didn’t take even one second.But I found the little difference between both the count. The count of a statistical table is higher than the actual count. (12585-row count is greater) because of MVCC.
You should configure auto-vacuum and analyze on the table.
I executed vacuum and analyze on the table, and now my count is same.
I would suggest, please use the statistical table for the row counts.
I am providing two different scripts for finding the rough row count in the PostgreSQL.
1 2 3 |
SELECT reltuples::bigint AS EstimatedCount FROM pg_class WHERE oid = 'public.TableName'::regclass; |
1 2 3 4 5 6 |
SELECT schemaname ,relname ,n_live_tup AS EstimatedCount FROM pg_stat_user_tables ORDER BY n_live_tup DESC; |