Database Reference
In-Depth Information
Figure 6-7 . The output of reading and storing tweets in a csv file
Output of the preceding command is shown in Figure 6-7 .
Upon running the preceding command, a new file ( part-m-00000 ) will be writ-
ten in the /home/vivek/output folder which will contain comma-separated val-
ues as
apress_team,technology,A whole bunch of new technology
topics about to come. Watch this space!
Such files are output files generated by the MapReduce job executed with the Pig
script. For more details about MapReduce, please refer to the previous chapter.
Please note that here we have used the default storage function PigStorage() ,
but readers may create their own UDFs and can store/load by using them. For example,
in the case of Cassandra, to load data in the Cassandra file system, the CSVStorage
and CassandraStorage functions will be used. We will discuss Cassandra's Pig-
specific functions in coming exercises.
FILTER
FILTER is used for rows/tuple selection based on the provided condition. Let's de-
scribe pipe_input and filter it by screen_name for the value apress_team :
describe pipe_input;
filter_by_name = FILTER pipe_input by screen_name matches
'apress_team';
Running this command will filter pipe_input for screen_name instances
with the value apress_team (see Figure 6-8 ).
 
 
Search WWH ::




Custom Search