This article is half-done without your Comment! *** Please share your thoughts via Comment ***
As Greenplum is a MPP architecture, so distribution of data in all segments is the first stuff.
You can distribute your table data using Distributed BY
, and if you are not sure about a particular column, you can create your table using Distributed Randomly.
But tables which are distributed randomly, are not good for table performance because query optimizer will take more time for Randomly distributed tables.
Greenplum DBA can use below script to find randomly distributed tables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
with cte AS ( SELECT pgn.nspname as SchemaName ,pgc.relname as TableName ,pga.attname as DistributionType FROM ( SELECT gdp.localoid, CASE WHEN ( Array_upper(gdp.attrnums, 1) > 0 ) THEN Unnest(gdp.attrnums) ELSE NULL END AS attnum FROM gp_distribution_policy gdp ORDER BY gdp.localoid ) AS distrokey INNER JOIN pg_class AS pgc ON distrokey.localoid = pgc.oid INNER JOIN pg_namespace pgn ON pgc.relnamespace = pgn.oid LEFT OUTER JOIN pg_attribute pga ON distrokey.attnum = pga.attnum AND distrokey.localoid = pga.attrelid ) select SchemaName ,TableName ,COALESCE(DistributionType,'DISTRIBUTED RANDOMLY') AS DistributionType from cte where COALESCE(DistributionType,'DISTRIBUTED RANDOMLY')='DISTRIBUTED RANDOMLY' ORDER BY SchemaName,TableName |
Leave a Reply