Data Migration and Analytics - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

Figure 6-7 . The output of reading and storing tweets in a csv file

Output of the preceding command is shown in Figure 6-7 .

Upon running the preceding command, a new file ( part-m-00000 ) will be writ-

ten in the /home/vivek/output folder which will contain comma-separated val-

ues as

apress_team,technology,A whole bunch of new technology

topics about to come. Watch this space!

Such files are output files generated by the MapReduce job executed with the Pig

script. For more details about MapReduce, please refer to the previous chapter.

Please note that here we have used the default storage function PigStorage() ,

but readers may create their own UDFs and can store/load by using them. For example,

in the case of Cassandra, to load data in the Cassandra file system, the CSVStorage

and CassandraStorage functions will be used. We will discuss Cassandra's Pig-

specific functions in coming exercises.

FILTER

FILTER is used for rows/tuple selection based on the provided condition. Let's de-

scribe pipe_input and filter it by screen_name for the value apress_team :

describe pipe_input;

filter_by_name = FILTER pipe_input by screen_name matches

'apress_team';

Running this command will filter pipe_input for screen_name instances

with the value apress_team (see Figure 6-8 ).

Search WWH ::

Custom Search

Home