Greenplum: Script to find Work Spill Files usage for all Segments

This article is half-done without your Comment! *** Please share your thoughts via Comment ***

In this post, I am sharing a script to find the total usage of work spill files for all the segments of Greenplum.

Whenever we are doing hash aggregation operations like GROUP BY or DISTINCT on billions of data, query executor holds this data in temporary files which called work spill files.

We should optimize our query to reduce the maximum use of work spill files because it degrades the overall performance and also requires more disk space.
Using below query, you can check the total number of spill files and their size for each segment.

SELECT

gwe.datname as DatabaseName

,psa.usename as UserName

,gwe.procpid as ProcessID

,gwe.sess_id as SessionID

,sc.hostname as HostName

,sum(size)/1024::float as SizePerHost

,sum(numfiles) NumOfFilesPerHost

FROM gp_toolkit.gp_workfile_entries as gwe

inner join pg_stat_activity as psa

on psa.procpid = gwe.procpid

and psa.sess_id = gwe.sess_id,

gp_segment_configuration as sc

,pg_filespace_entry as fe

,pg_database as d

WHERE fe.fsedbid=sc.dbid

AND gwe.segid=sc.content

AND gwe.datname=d.datname

AND sc.role='p'

group by

gwe.datname

,psa.usename

,gwe.procpid

,gwe.sess_id

,sc.hostname

ORDER BY

gwe.datname

,psa.usename

,gwe.procpid

,gwe.sess_id

,sc.hostname

Jun 23, 2017Anvesh Patel

Greenplum: Script to find Work Spill Files usage for all Segments

Leave a Reply Cancel reply

Anvesh Patel

About Me!

About DBRND !

Recent Comments !

Follow Me !