Integration with Hadoop - Mastering Apache Cassandra

Database Reference

In-Depth Information

Integration with other analytical tools

Although Hadoop and its companion projects are the most widely used tools for the analys-

is of large datasets, in the recent demand for real-time analytics and machine learning there

are very successful tools that have been developed. Some of these tools store their own data

like MongoDB, which is basically a database but also provides decent built-in analytics

toolings as part of it, and Druid ( https://github.com/metamx/druid ), which claims to be a

column store (like Cassandra) with fast analytical tooling. Software such as Twitter Storm

( https://storm.apache.org/ ) that provide real-time stream analysis and Spark or Shark ( ht-

tps://spark.apache.org/ ) that do not have their own data store but databases can be plugged

into their respective frameworks to get them working. The scope of this chapter does not

allow us to discuss the how-to for all this software; however, it is not extremely painful to

get them working with Cassandra.

Storm can easily be integrated with Cassandra by actually writing read or write code using

the Cassandra driver in its Spout and/or Bolt definitions. This is probably the easiest ap-

proach. One may also look into a somewhat older Cassandra-Storm integration project at

DataStax provides integration for Spark with Cassandra. If you need to integrate with Cas-

sandra, it may be worth having a look at the documentation of DataStax's

Spark-Cassandra-connector project at https://github.com/datastax/spark-cassandra-con-

nector .

One may want to look into DataStax Enterprise Edition for the built-in integration of some

of the popular analytical engines with Cassandra.

Search WWH ::

Custom Search

Home