Database Reference
In-Depth Information
If everything works OK and Cassandra and Hadoop are up, you may access the Pig con-
sole to execute queries in an interactive mode as follows:
pig$ bin/pig
2013-07-22 13:32:22,709 [main] INFO org.apache.pig.Main -
Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013,
02:13:53
2013-07-22 13:32:22,710 [main] INFO org.apache.pig.Main -
Logging error messages to: /home/nishant/apps/pig-0.11.1/
pig_1374480142703.log
2013-07-22 13:32:22,757 [main] INFO
org.apache.pig.impl.util.Utils - Default bootup file /home/
nishant/.pigbootup not found
2013-07-22 13:32:23,080 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
- Connecting to hadoop file system at: hdfs://localhost:9000
2013-07-22 13:32:24,133 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine
- Connecting to map-reduce job tracker at: localhost:9001
grunt>
Let's copy some Hadoop XML files into HDFS and run a word count on it as follows:
# Load all the files in $HADOOP_HOME/conf to pigdata in HDFS
$ bin/hadoopfs -put confpigdata
# --- in pig console ---
# load all the files from HDFS
grunt> A = load './pigdata';
# loop line by line in all the input files from A split
them into words
grunt> B = foreach A generate
flatten(TOKENIZE((chararray)$0)) as word;
# Group the tokenized words into variable C, groub by
attribute "word"
grunt> C = group B by word;
Search WWH ::




Custom Search