This article is half-done without your Comment! *** Please share your thoughts via Comment ***
In this post, I am sharing a script to find the total usage of work spill files for all the segments of Greenplum.
Whenever we are doing hash aggregation operations like GROUP BY or DISTINCT on billions of data, query executor holds this data in temporary files which called work spill files.
We should optimize our query to reduce the maximum use of work spill files because it degrades the overall performance and also requires more disk space.
Using below query, you can check the total number of spill files and their size for each segment.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
SELECT gwe.datname as DatabaseName ,psa.usename as UserName ,gwe.procpid as ProcessID ,gwe.sess_id as SessionID ,sc.hostname as HostName ,sum(size)/1024::float as SizePerHost ,sum(numfiles) NumOfFilesPerHost FROM gp_toolkit.gp_workfile_entries as gwe inner join pg_stat_activity as psa on psa.procpid = gwe.procpid and psa.sess_id = gwe.sess_id, gp_segment_configuration as sc ,pg_filespace_entry as fe ,pg_database as d WHERE fe.fsedbid=sc.dbid AND gwe.segid=sc.content AND gwe.datname=d.datname AND sc.role='p' group by gwe.datname ,psa.usename ,gwe.procpid ,gwe.sess_id ,sc.hostname ORDER BY gwe.datname ,psa.usename ,gwe.procpid ,gwe.sess_id ,sc.hostname |
Leave a Reply