Database Reference
In-Depth Information
CHAPTER 5
MapReduce with Cassandra
So, what's next after discussing data modeling, security, and user role privileges man-
agement? With Cassandra query language (CQL), we can definitely manage basic
query-based analytics via primary key and secondary indexes and keep data model de-
normalized as much as possible. But, still, it is possible to perform analytics over a very
large chunk of data, in a manner similar to joins, or to persist data into Cassandra after
counting specific fields, such as counting the tweets of a particular user account for a
given date range. Clearly it's a case of large data analytics, more specifically batch ana-
lytics.
In this chapter we will:
Provide an introduction to MapReduce
Explore Hadoop
Discuss HDFS and MapReduce
Describe integrating Cassandra with MapReduce
Batch Processing and MapReduce
Any form of data, structured or unstructured, would be meaningless unless it gets pro-
cessed. So far we have discussed various ways to manage and model data volume into
Cassandra.
What about running analytics over such archived large data sets? Large data analyt-
ics can be divided into two broad categories:
Batch processing
Search WWH ::




Custom Search