Data Migration and Analytics - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

CHAPTER 6

Data Migration and Analytics

In the previous chapter, we discussed the benefits of and requirements for running batch

analytics over Cassandra via Hadoop MapReduce. We can easily utilize Hadoop

MapReduce's pluggable architecture to implement MapReduce even with custom imple-

mentations. Let's talk about a few of the published use cases as follows:

•

NetApp collects and analyzes system diagnostic-related data for improv-

ing the quality of client site deployed systems.

•

The leading health insurance provider collects and processes millions of

claims per day. Data ingestion is approximately 1 TB per day.

•

Nokia collects and analyzes data related to various mobile phones. The

expected data volume to deal with is about 600 TB with both structured

and unstructured forms of data.

•

Etsy, an online marketplace for handmade items, needs to analyze bil-

lions of logs for behavior targeting and building search-based recom-

mendations.

Storage of large amounts of data is a significant issue, and with Cassandra we can

achieve faster ingestion. What about analyzing these large datasets? CQL3 comes in

very handy as an SQL-like interface, but to process and analyze in parallel batches, we

need to implement MapReduce-like algorithms. Previous chapters cover implementing

MapReduce basics and implementing in Java. But in a few cases we prefer ready-to-use

and easy-to-integrate solutions!

In this chapter we will discuss

•

Apache Pig setup and basics

Search WWH ::

Custom Search

Home