Database Reference
In-Depth Information
Integration with other analytical tools
Although Hadoop and its companion projects are the most widely used tools for the analys-
is of large datasets, in the recent demand for real-time analytics and machine learning there
are very successful tools that have been developed. Some of these tools store their own data
like MongoDB, which is basically a database but also provides decent built-in analytics
toolings as part of it, and Druid ( https://github.com/metamx/druid ), which claims to be a
column store (like Cassandra) with fast analytical tooling. Software such as Twitter Storm
( https://storm.apache.org/ ) that provide real-time stream analysis and Spark or Shark ( ht-
tps://spark.apache.org/ ) that do not have their own data store but databases can be plugged
into their respective frameworks to get them working. The scope of this chapter does not
allow us to discuss the how-to for all this software; however, it is not extremely painful to
get them working with Cassandra.
Storm can easily be integrated with Cassandra by actually writing read or write code using
the Cassandra driver in its Spout and/or Bolt definitions. This is probably the easiest ap-
proach. One may also look into a somewhat older Cassandra-Storm integration project at
https://github.com/ptgoetz/storm-cassandra .
DataStax provides integration for Spark with Cassandra. If you need to integrate with Cas-
sandra, it may be worth having a look at the documentation of DataStax's
Spark-Cassandra-connector project at https://github.com/datastax/spark-cassandra-con-
nector .
One may want to look into DataStax Enterprise Edition for the built-in integration of some
of the popular analytical engines with Cassandra.
Search WWH ::




Custom Search