Implementing Analytics with Greenplum UAP - Getting Started with Greenplum for Big Data Analytics

Database Reference

In-Depth Information

- Connecting to hadoop file system at:

hdfs://localhost:9000

main] INFO

org.apache.pig.backend.hadoop.executionengine.HExecutionEngine

- Connecting to map-reduce job tracker at:

localhost:9001

grunt>

For example:

grunt> CountByYear = FOREACH GroupByYear

>> GENERATE

CONCAT((chararray)$0,CONCAT(':',(chararray)COUNT($1)));

2012-11-05 01:09:11,996 [main] WARN

org.apache.pig.PigServer - Encountered Warning

IMPLICIT_CAST_TO_DOUBLE 1 time(s).

grunt> STORE CountByYear >> INTO '/user/work/

output/pig_output_bookx' USING PigStorage('t');

Data communication between Greenplum

Database and Hadoop (using external tables)

Greenplum supports exchanging HDFS data with Greenplum Database as external

tables, allowing for the reading from and writing to the HDFS directly from the Green-

plum Database, with the HDFS supporting full SQL syntax.

This combination leverages the full parallelism of the Greenplum Database and the

HDFS, utilizing the resource of all Greenplum segments when reading and writing

data with the HDFS.

Data is read into the Greenplum Database as an external table directly from the

HDFS DataNode, and it is written out from the Greenplum Database segment serv-

ers to the HDFS. This relies on the HDFS to distribute data load evenly across the

DataNodes.

Following are the steps to read data from Hadoop HDFS into Greenplum Database:

Search WWH ::

Custom Search

Home