This article is half-done without your Comment! *** Please share your thoughts via Comment ***
The first important thing, we have to define Distribution key correctly which is a primary for distribution of data in Greenplum.
Greenplum is a base on MPP architecture where data equally distributes across the child segments.
Before creating a table, we should analyze the distribution logic and define distribution keys where data must be unique for equal distribution.
My suggestion: Once you distribute your data basis on defined distribution key, you should not alter the distribution key.
Because data redistribution requires on disk which can be resource intensive, but still I am sharing the steps to ALTER the distribution key.
Alter Distribution key:
1 |
ALTER TABLE table_name SET DISTRIBUTED BY (column_name); |
For Random distribution:
For random distribution, you should do REORGANIZE=TRUE to remove unnecessary skewness of data.
1 2 |
ALTER TABLE table_name SET DISTRIBUTED RANDOMLY; ALTER TABLE table_name SET WITH (REORGANIZE=TRUE); |
Leave a Reply