Database Reference
In-Depth Information
CHAPTER 5
MapReduce with Cassandra
So, what's next after discussing data modeling, security, and user role privileges man-
agement? With Cassandra query language (CQL), we can definitely manage basic
query-based analytics via primary key and secondary indexes and keep data model de-
normalized as much as possible. But, still, it is possible to perform analytics over a very
large chunk of data, in a manner similar to joins, or to persist data into Cassandra after
counting specific fields, such as counting the tweets of a particular user account for a
given date range. Clearly it's a case of large data analytics, more specifically batch ana-
lytics.
In this chapter we will:
•
Provide an introduction to MapReduce
•
Explore Hadoop
•
Discuss HDFS and MapReduce
•
Describe integrating Cassandra with MapReduce
Batch Processing and MapReduce
Any form of data, structured or unstructured, would be meaningless unless it gets pro-
cessed. So far we have discussed various ways to manage and model data volume into
Cassandra.
What about running analytics over such archived large data sets? Large data analyt-
ics can be divided into two broad categories:
•
Batch processing