Database Reference
In-Depth Information
- Connecting to hadoop file system at:
hdfs://localhost:9000
main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
- Connecting to map-reduce job tracker at:
localhost:9001
grunt>
For example:
grunt> CountByYear = FOREACH GroupByYear
>> GENERATE
CONCAT((chararray)$0,CONCAT(':',(chararray)COUNT($1)));
2012-11-05 01:09:11,996 [main] WARN
org.apache.pig.PigServer - Encountered Warning
IMPLICIT_CAST_TO_DOUBLE 1 time(s).
grunt> STORE CountByYear >> INTO '/user/work/
output/pig_output_bookx' USING PigStorage('t');
Data communication between Greenplum
Database and Hadoop (using external tables)
Greenplum supports exchanging HDFS data with Greenplum Database as external
tables, allowing for the reading from and writing to the HDFS directly from the Green-
plum Database, with the HDFS supporting full SQL syntax.
This combination leverages the full parallelism of the Greenplum Database and the
HDFS, utilizing the resource of all Greenplum segments when reading and writing
data with the HDFS.
Data is read into the Greenplum Database as an external table directly from the
HDFS DataNode, and it is written out from the Greenplum Database segment serv-
ers to the HDFS. This relies on the HDFS to distribute data load evenly across the
DataNodes.
Following are the steps to read data from Hadoop HDFS into Greenplum Database:
Search WWH ::




Custom Search