MapReduce with Cassandra - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

The complete source code of this MapReduce job can be found with the downloads

for this topic. The executable class is TwitterCassandraJob

( com.apress.chapter5.mapreduce.twittercount.cassandra pack-

age). You may also refer to README.txt and db.txt (under src/main/re-

sources ) file for further instructions.

After successfully executing the job, the output in the tweetcount column fam-

ily is shown in Figure 5-6 .

Figure 5-6 . Counts specific to user mevivs are stored in tweetcount

Complete source code for this recipe is available under

com.apress.chapter5.mapreduce.twittercount.cassandra folder in

the downloads for this topic.

Stream or Real-Time Analytics

Batch processing frameworks is a good fit for a write-once/read-everywhere paradigm.

But for continuous updates to the data set, any in-process Hadoop job will not pick

those data updates and would require a rerun.

Real-time analytics would require processing and analyzing a massive amount of

data as it enters the system. Applications such as stock market trading and dynamic

predictive analysis would require providing analytics in real time as the data gets pro-

cessed on to the system.

In the last year or so, there has been significant interest in building such a real-time

analytics application. As a result, there are number of new frameworks, such as storm,

Samaza, and Kafka. We will discuss their integration in subsequent chapters.

Search WWH ::

Custom Search

Home