Database Reference
In-Depth Information
Figure 6-7
.
The output of reading and storing tweets in a csv file
Output of the preceding command is shown in
Figure 6-7
.
Upon running the preceding command, a new file (
part-m-00000
) will be writ-
ten in the
/home/vivek/output
folder which will contain comma-separated val-
ues as
apress_team,technology,A whole bunch of new technology
topics about to come. Watch this space!
Such files are output files generated by the MapReduce job executed with the Pig
script. For more details about MapReduce, please refer to the previous chapter.
Please note that here we have used the default storage function
PigStorage()
,
but readers may create their own UDFs and can store/load by using them. For example,
in the case of Cassandra, to load data in the Cassandra file system, the
CSVStorage
and
CassandraStorage
functions will be used. We will discuss Cassandra's Pig-
specific functions in coming exercises.
FILTER
FILTER
is used for rows/tuple selection based on the provided condition. Let's de-
scribe
pipe_input
and filter it by
screen_name
for the value
apress_team
:
describe pipe_input;
filter_by_name = FILTER pipe_input by screen_name matches
'apress_team';
Running this command will filter
pipe_input
for
screen_name
instances