Database Reference
In-Depth Information
CREATE TABLE tablename (
column_name1 data_type NOT NULL,
column_name2 data_type NOT NULL,
Column_name3 data_type NOT NULL …)
[DISTRIBUTE BY (column_name)] à Hash algorithm
[DISTRIBUTED RANDOMLY] à Round-robin algorithm
Data skew and performance
In an MPP shared nothing environment, overall response time for a query is meas-
ured by the completion time for all segments. If the data is skewed, the segments
with more data will have a longer completion time. The optimal goal is that each
segment should have a comparable number of rows and perform approximately the
same amount of processing. Have a look at the following figure:
Optimizing the broadcast or redistribution motion for data
co-location
A broadcast motion is usually not as optimal as a redistribute motion for very large
tables. The gp_segments_for_planner configuration should be used to optimize
the impact due to broadcast or redistribution operation.
By default, this configuration parameter takes value 0 .
gp_segments_for_planner sets the number of primary segment instances for
the planner to assume in its cost and size estimates.
• If gp_segments_for_planner is set to 0 , the value used is the actual
number of primary segments. This variable affects the planner's estimates of
the number of rows handled by each sending and receiving process in mo-
tion operators.
• Increasing the number of primary segments will increase the cost of the mo-
tion, hence favoring a redistribute motion over a broadcast motion.
Search WWH ::




Custom Search