performance benefits that overweigh the administrative overhead, do not go
• The partitioning strategy should facilitate dividing data into somewhat equal
parts based on a defining criteria and the defining criteria is used in query
predicates ( WHERE clause). If the query access pattern ( SELECT …. WHERE )
does not match the partitioning definition, the benefit of partition elimination
cannot be maximized.
• When defining partitioning criteria, it is important to not have overlapping
• Use the pg_partitions view to get information on the partition design.
• To partition an existing table, you must recreate and reload the table as a
Querying Greenplum Database and HD
In the first section of this document, we have seen various options to load data (both
structured and unstructured) into Greenplum environment in a parallel mode. In this
section, we will focus on learning how to query data from Greenplum Database and
HD environments. Also, we will explore interfaces that help integrate data between
Greenplum Database and HD and leverage the benefit of holding one single copy of
Querying Greenplum Database
Greenplum Database is built over PostgreSQL and supports all standard SQL and
PL/SQL operations. Additionally, because of the distributed nature, there are few
new options that are built for scaled performance over the data cluster.
Let us now look at how Greenplum executes queries across data from all the seg-
ments. Internally, it implements scatter/gather mechanism that is unique to Green-
When working with Greenplum, we issue queries to the database similar to any other
database. In the context of Greenplum, the internal implementation however varies:
• The master host receives, parses, and optimizes the query, creates a parallel
query plan, and dispatches the same plan to all the segments for execution.