master host failure the standby master is activated and constructed using the
• Dual interconnect switches : A highly available interconnect can be
achieved by deploying redundant network interfaces on all Greenplum hosts
and a dual Gigabit Ethernet. The default configuration is to have one network
interface per primary segment instance on a segment host (both the inter-
connects are by default 10Gig in DCA).
High-speed data loading using external tables
Data loading into Greenplum Database can come through an ETL host connected to
the interconnect. gpfdist utility can be leveraged to connect to external ETL and
load data into segments simultaneously using the scatter-and-gather method. This
utility runs an internal HTTP light server. The query execution plan is to broadcast
to all segments, even if they do not contain data. The segments would then run the
query plan with appropriate data. This work is done in parallel.
External tables are used to access data external to the Greenplum Database. Large
amounts of data can be loaded or unloaded using external tables. Following formats
are supported by external tables:
• CSV ( Comma Separated Values ), regular file based ( file:// )
• Hadoop file system data ( gphdfs:// )
• Web based external sources with support for text data. ( http:// )
External table types
Greenplum supports two kinds of external tables:
• Readable or read-only tables used for data loading.
• Writable or write-only tables used for data unloading. A writable external
table allows selecting rows from database tables and output the rows to files.
Polymorphic data storage and historic data management
Polymorphic data storage is a unique and differentiating feature of Greenplum. It fa-
cilitates configuring optimal storage, compression, and execution settings to support
row/column-oriented storage and retrieval. As a core requirement for a data ware-