housing application, we would need to store and process large volumes of data that
can be historic in nature.
Partitions need not be the same size or orientation. They can be column or row ori-
ented. We can partition data based on a selected time frame.
The example shows a rolling management scheme, where three months of data is
maintained at a time. The scheme is as follows:
• Anything more than three months is moved to deep history (probably a com-
pressed (most likely) column-oriented store), as per the previous figure
• All in-between months are maintained at a second level of storage (a row-ori-
The options can be compressed or uncompressed as well. This process is custom-
izable based on organizational needs and for the user, the data access would be
seamless and does not require user's intervention for managing the internals of the
Let us now understand how Greenplum stores data across various hosts and seg-
All tables in Greenplum are distributed. This means a table is divided into non-over-
lapping sets of rows or parts. Each part resides on a single database known as a
segment within the Greenplum Database system. The parts are distributed evenly
across all of the available segments using a sophisticated hashing algorithm.
Distribution is determined at the table creation time by selecting a distribution key of
one or more columns. The distribution key is usually the primary key or any unique