MapReduce with Cassandra - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

CHAPTER 5

MapReduce with Cassandra

So, what's next after discussing data modeling, security, and user role privileges man-

agement? With Cassandra query language (CQL), we can definitely manage basic

query-based analytics via primary key and secondary indexes and keep data model de-

normalized as much as possible. But, still, it is possible to perform analytics over a very

large chunk of data, in a manner similar to joins, or to persist data into Cassandra after

counting specific fields, such as counting the tweets of a particular user account for a

given date range. Clearly it's a case of large data analytics, more specifically batch ana-

lytics.

In this chapter we will:

•

Provide an introduction to MapReduce

•

Explore Hadoop

•

Discuss HDFS and MapReduce

•

Describe integrating Cassandra with MapReduce

Batch Processing and MapReduce

Any form of data, structured or unstructured, would be meaningless unless it gets pro-

cessed. So far we have discussed various ways to manage and model data volume into

Cassandra.

What about running analytics over such archived large data sets? Large data analyt-

ics can be divided into two broad categories:

•

Batch processing

Search WWH ::

Custom Search

Home