Database Reference
In-Depth Information
Oseias Moraes | 334
Adeesh Fulay | 334
Sat Nov 05 21:29:20 IST 2011 | 334
Sun Jul 24 21:44:17 IST 2011 | 334
Manthita. | 334
ebooksdealofdaybot | 334
Wed Apr 23 19:19:49 IST 2014 | 334
The News Selector | 6680
Louise Corrigan | 334
Mon Mar 03 01:19:17 IST 2014 | 6680
22 Rows Returned.
The complete source code is available with the downloads for this topic, and
classes discussed are
com.apress.chapter5.mapreduce.twittercount.hdfs.TwitterHDFSCQLJob
com.apress.chapter5.mapreduce.twittercount.hdfs.TweetAggregator
In next section we will discuss using Cassandra as an input and output format for
MapReduce.
Cassandra In and Cassandra Out
Let's discuss running a MapReduce where input will be fetched from Cassandra and
output will also get stored in Cassandra.
So far we have seen that the MapReduce job execution is possible over default
HDFS and over an external file system such as Cassandra. You must be wondering
which file system to adopt and why? Well it depends on the use case. For example, if
an application has already been built using various Cassandra features, it's better to
keep its MapReduce base batch analytics to be implemented in Cassandra. There can
be use cases where HDFS has already been used for storing raw data and the user
might not agree with migration but still want to run a few MapReduce jobs and store
output into Cassandra. Similarly the user may want to migrate away from HDFS and its
ecosystem (Hive, Pig, and so forth) to a single database solution (i.e., Cassandra). One
big difference we must remember is that HDFS is a distributed file system, whereas
Cassandra is a distributed database. Cassandra is fault-tolerant and doesn't have a
Search WWH ::




Custom Search