Database Reference
In-Depth Information
CHAPTER 6
Data Migration and Analytics
In the previous chapter, we discussed the benefits of and requirements for running batch
analytics over Cassandra via Hadoop MapReduce. We can easily utilize Hadoop
MapReduce's pluggable architecture to implement MapReduce even with custom imple-
mentations. Let's talk about a few of the published use cases as follows:
NetApp collects and analyzes system diagnostic-related data for improv-
ing the quality of client site deployed systems.
The leading health insurance provider collects and processes millions of
claims per day. Data ingestion is approximately 1 TB per day.
Nokia collects and analyzes data related to various mobile phones. The
expected data volume to deal with is about 600 TB with both structured
and unstructured forms of data.
Etsy, an online marketplace for handmade items, needs to analyze bil-
lions of logs for behavior targeting and building search-based recom-
mendations.
Storage of large amounts of data is a significant issue, and with Cassandra we can
achieve faster ingestion. What about analyzing these large datasets? CQL3 comes in
very handy as an SQL-like interface, but to process and analyze in parallel batches, we
need to implement MapReduce-like algorithms. Previous chapters cover implementing
MapReduce basics and implementing in Java. But in a few cases we prefer ready-to-use
and easy-to-integrate solutions!
In this chapter we will discuss
Apache Pig setup and basics
Search WWH ::




Custom Search