Advanced Analytics—Technology and Tools: MapReduce and Hadoop - Data Science and Big Data Analytics

Database Reference

In-Depth Information

Like Hadoop, Pig's origin began at Yahoo! in 2006. Pig was transferred to the

Apache Software Foundation in 2007 and had its first release as an Apache Hadoop

subproject in 2008. As Pig evolves over time, three main characteristics persist:

ease of programming, behind-the-scenes code optimization, and extensibility of

capabilities [24].

With Apache Hadoop and Pig already installed, the basics of using Pig include

entering the Pig execution environment by typing pig at the command prompt and

then entering a sequence of Pig instruction lines at the grunt prompt.

An example of Pig-specific commands is shown here:

$ pig

grunt> records = LOAD '/user/customer.txt' AS

(cust_id:INT, first_name:CHARARRAY,

last_name:CHARARRAY,

email_address:CHARARRAY);

grunt> filtered_records = FILTER records

BY email_address matches '.*@isp.com';

grunt> STORE filtered_records INTO '/user/isp_customers';

grunt> quit

$

At the first grunt prompt, a text file is designated by the Pig variable records

with four defined fields: cust_id , first_name , last_name , and

email_address . Next, the variable filtered_records is assigned those

records where the email_address ends with @ isp.com to extract the customers

whose e-mail address is from a particular Internet service provider (ISP). Using

the STORE command, the filtered records are written to an HDFS folder,

isp_customers . Finally, to exit the interactive Pig environment, execute the

QUIT command. Alternatively, these individual Pig commands could be written

to the file filter_script.pig and submit them at the command prompt as

follows:

$ pig filter_script.pig

Such Pig instructions are translated, behind the scenes, into one or more

MapReduce jobs. Thus, Pig simplifies the coding of a MapReduce job and enables

the user to quickly develop, test, and debug the Pig code. In this particular

example, the MapReduce job would be initiated after the STORE command is

Search WWH ::

Custom Search

Home