• Querying Hadoop ( HD )
• Greenplum Data Computing Appliance ( DCA ) and monitoring
• Running analytic functions
• R and Weka with Greenplum
• Advanced SQL options on Greenplum for analytics (Windows func-
tions and aggregates)
• MADlib with Greenplum
• Using Chorus
Data loading for Greenplum Database
This section provides step-by-step instructions on all the approaches to load struc-
tured data into Greenplum Database (ELT using external tables) and any unstruc-
tured data into HD using proprietary utilities within Greenplum distribution. Addition-
ally, for Greenplum Database, we will also look at options to integrate with an ex-
ternal ETL tool like Informatica PowerCenter using a specialized connecter called
PowerExchange ( PWX ) connector.
Greenplum data loading options
Data can be loaded, transformed, and formatted in Greenplum using in-built utilities
and tools. There are the options that load data into Greenplum in parallel or sequen-
tial form. The following are the different ways to load data into Greenplum Database:
• INSERT : INSERT command is a standard SQL command that is used for
loading data into database tables in a row-by-row fashion. This option should
not be used for loading large columns. In this option, data is routed through
the master node and can prove to be a bottleneck in case of large volumes.
This command is commonly used in JDBC/ODBC-based communication.