Database Reference
In-Depth Information
The complete source code of this MapReduce job can be found with the downloads
for this topic. The executable class is TwitterCassandraJob
( com.apress.chapter5.mapreduce.twittercount.cassandra pack-
age). You may also refer to README.txt and db.txt (under src/main/re-
sources ) file for further instructions.
After successfully executing the job, the output in the tweetcount column fam-
ily is shown in Figure 5-6 .
Figure 5-6 . Counts specific to user mevivs are stored in tweetcount
Complete source code for this recipe is available under
com.apress.chapter5.mapreduce.twittercount.cassandra folder in
the downloads for this topic.
Stream or Real-Time Analytics
Batch processing frameworks is a good fit for a write-once/read-everywhere paradigm.
But for continuous updates to the data set, any in-process Hadoop job will not pick
those data updates and would require a rerun.
Real-time analytics would require processing and analyzing a massive amount of
data as it enters the system. Applications such as stock market trading and dynamic
predictive analysis would require providing analytics in real time as the data gets pro-
cessed on to the system.
In the last year or so, there has been significant interest in building such a real-time
analytics application. As a result, there are number of new frameworks, such as storm,
Samaza, and Kafka. We will discuss their integration in subsequent chapters.
 
 
Search WWH ::




Custom Search