Database Reference
In-Depth Information
sample tweets file from the attachment folder ( datafiles ) . This file contains tweet
data, screen names, and the tweet body delimited by '\001' (see Figure 6-11 ) .
Figure 6-11 . The sample tweet file delimited by '\001'
Counting Tweets
In this example we will demonstrate running various Pig commands using the interact-
ive Grunt shell. For Pig scripts having a medium level of complexity, we may want to
prepare and run those as Pig scripts, as well. The command to run a Pig script is as fol-
lows:
Pig -x local myscript.pig
Here myscript.pig is a compiled Pig script. We can also execute such Pig
scripts in embedded mode as follows:
// Compile to .class file
javac -cp pig.jar MyScript.java
// Running Pig script as java program in embeddeded mode
java -cp:pig.jar:. MyScript
In this exercise, we will explore Apache Pig for running the MapReduce program
for total tweet count and counting tweets for a specific screen_name .
1.
First load tweets using PigStorage :
tweets = LOAD '/home/vivek/tweets' USING
PigStorage('\ua001') as
(date:chararray,screen_name:chararray,body:chararray);
2.
Let's filter tweets for the screen name The News Selector .
name = FILTER tweets by screen_name matches
'The News Selector';
 
 
Search WWH ::




Custom Search