Implementing Analytics with Greenplum UAP - Getting Started with Greenplum for Big Data Analytics

Database Reference

In-Depth Information

• Querying Hadoop ( HD )

• QueryingGreenplumandHadoop(combiningstructuredandunstruc-

tured data)

• Greenplum Data Computing Appliance ( DCA ) and monitoring

• Running analytic functions

• R and Weka with Greenplum

• Advanced SQL options on Greenplum for analytics (Windows func-

tions and aggregates)

• MADlib with Greenplum

• Using Chorus

Data loading for Greenplum Database

and HD

This section provides step-by-step instructions on all the approaches to load struc-

tured data into Greenplum Database (ELT using external tables) and any unstruc-

tured data into HD using proprietary utilities within Greenplum distribution. Addition-

ally, for Greenplum Database, we will also look at options to integrate with an ex-

ternal ETL tool like Informatica PowerCenter using a specialized connecter called

PowerExchange ( PWX ) connector.

Greenplum data loading options

Data can be loaded, transformed, and formatted in Greenplum using in-built utilities

and tools. There are the options that load data into Greenplum in parallel or sequen-

tial form. The following are the different ways to load data into Greenplum Database:

• INSERT : INSERT command is a standard SQL command that is used for

loading data into database tables in a row-by-row fashion. This option should

not be used for loading large columns. In this option, data is routed through

the master node and can prove to be a bottleneck in case of large volumes.

This command is commonly used in JDBC/ODBC-based communication.

• Syntax:

Search WWH ::

Custom Search

Home