This article is half-done without your Comment! *** Please share your thoughts via Comment ***
In this post, I am sharing a script to find the total usage of work spill files for each running query in Greenplum.
Whenever we are doing hash aggregation operation like GROUP BY or DISTINCT on billions of data, query executor holds this data in temporary files which called work spill files.
We should optimize our query to reduce the maximum use of work spill files because it degrades the overall performance and also requires more disk space.
Using below script, we can monitor the usage of work spill file for each running query and we can also optimize the long running queries.
1 2 3 4 5 6 7 8 9 10 11 12 |
SELECT gwupq.datname ,psa.usename ,gwupq.procpid ,gwupq.sess_id ,sum(gwupq.size)/1024::float as total_spill_size_kb ,sum(gwupq.numfiles) as total_spill_file FROM gp_toolkit.gp_workfile_usage_per_query as gwupq inner join pg_stat_activity psa on gwupq.procpid = psa.procpid and gwupq.sess_id = psa.sess_id GROUP BY 1,2,3,4; |
Leave a Reply