Database Reference
In-Depth Information
2013-11-26 10:55:49 Processing rows: 330936 Hashtable size: 330936
Memory usage: 149795152 rate: 0.166
2013-11-26 10:55:49 Dump the hashtable into file: file:/tmp/msgbigdata/
hive_2013-11-26 _22-55-34_959_3143934780177488621/-local-10002/
2013-11-26 10:55:56 Upload 1 File to: file:/tmp/msgbigdata/
hive_2013-11-26 _22-55-34_959_3143934780177488621/-local-10002/
HashTable-Stage-4/MapJoin-mapfile01-.hashtable File size: 39685647
2013-11-26 10:55:56 End of local task; Time Taken: 13.203 sec.
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
Mapred Local Task Succeeded . Convert the Join into MapJoin
Launching Job 2 out of 2
Hive is a common choice in the Hadoop world. SQL users take no time to get started with Hive, because the
schema-based data structure is very familiar to them. Familiarity with SQL syntax also translates well into using Hive.
Pig Jobs
Pig is a set-based, data-transformation tool that works on top of Hadoop and cluster storage. Pig offers a
command-line application for user input called Grunt , and the scripts are called Pig Latin . Pig can be run on the
name-node host or client machine, and it can run jobs that read data from HDFS/WASB and compute data using
the MapReduce framework. The biggest advantage, again, is to free the developer from writing complex MapReduce
Configuration File
The configuration file for Pig is , and it is found in the C:\apps\dist\pig-\conf\
directory of the HDInsight name node. It contains several key parameters for controlling job submission and
execution. Listing 13-15 highlights a few of them.
Listing 13-15. file
#Verbose print all log messages to screen (default to print only INFO and above to screen)
#Exectype local|mapreduce, mapreduce is default
#The following two parameters are to help estimate the reducer number
#Performance tuning properties
These properties help you control the number of mappers and reducers, and several other performance-tuning
options dealing with the internal dataset joins and memory usage.
Search WWH ::

Custom Search