Database Reference
In-Depth Information
The complete source code of this MapReduce job can be found with the downloads
for this topic. The executable class is
TwitterCassandraJob
(
com.apress.chapter5.mapreduce.twittercount.cassandra
pack-
age). You may also refer to
README.txt
and
db.txt
(under
src/main/re-
sources
) file for further instructions.
After successfully executing the job, the output in the
tweetcount
column fam-
ily is shown in
Figure 5-6
.
Figure 5-6
.
Counts specific to user mevivs are stored in tweetcount
Complete source code for this recipe is available under
com.apress.chapter5.mapreduce.twittercount.cassandra
folder in
the downloads for this topic.
Stream or Real-Time Analytics
Batch processing frameworks is a good fit for a write-once/read-everywhere paradigm.
But for continuous updates to the data set, any in-process Hadoop job will not pick
those data updates and would require a rerun.
Real-time analytics would require processing and analyzing a massive amount of
data as it enters the system. Applications such as stock market trading and dynamic
predictive analysis would require providing analytics in real time as the data gets pro-
cessed on to the system.
In the last year or so, there has been significant interest in building such a real-time
analytics application. As a result, there are number of new frameworks, such as storm,
Samaza, and Kafka. We will discuss their integration in subsequent chapters.