Database Reference
In-Depth Information
This script lists the contents of the Pig job results directory and then dumps the last 10 lines of the part file within
that directory that contains the word-count job data. It does this by using the Hadoop file system cat command and
the Linux tail command. So, to run the job, you just execute the run_wc2.sh Bash script:
[hadoop@hc1nn pig]$ ./run_wc2.sh
Deleted hdfs://hc1nn:54310/user/hadoop/pig/wc_result1
2014-06-24 19:06:44,651 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.1 (r1585011)
compiled Apr 05 2014, 01:41:34
2014-06-24 19:06:44,652 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/
pig/pig_1403593604648.log
...............................
Input(s):
Successfully read 10377 records (410375 bytes) from: "/user/hadoop/pig/10031.txt"
Output(s):
Successfully stored 9641 records (95799 bytes) in: "/user/hadoop/pig/wc_result1"
Counters:
Total records written : 9641
Total bytes written : 95799
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_201406241807_0002
2014-06-24 19:07:23,252 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
MapReduceLauncher - Success!
You then list the results of the job that will output the Pig job results directory and the last 10 lines of the Pig job
data file, as explained in the description of the result_wc_.sh script described above:
[hadoop@hc1nn pig]$ ./result_wc.sh
Found 3 items
-rw-r--r-- 1 hadoop supergroup 0 2014-06-24 19:07 /user/hadoop/pig/wc_result1/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2014-06-24 19:06 /user/hadoop/pig/wc_result1/_logs
-rw-r--r-- 1 hadoop supergroup 95799 2014-06-24 19:07 /user/hadoop/pig/wc_result1/part-r-00000
unexceptionable 1
constitutionally 1
misunderstanding 1
tintinnabulation 1
unenforceability 1
Anthropomorphites 1
contradistinction 1
preconsiderations 1
undistinguishable 1
transcendentalists 1
 
Search WWH ::




Custom Search