Integration with Hadoop - Mastering Apache Cassandra

Database Reference

In-Depth Information

Chapter 8. Integration with Hadoop

Big data is the latest trend in the technical community and industry in general. Cassandra

and many other NoSQL solutions solve a major part of the problem: storing a large amount

of datasets in a scalable manner while keeping the mutations and retrieval queries fast.

However, this is just half the picture. A major part is processing. A database that provides

better integration with analytical tools such as Apache Hadoop, Twitter Storm, Pig, Spark,

and other platforms will be a preferable choice.

Cassandra provides native support to Hadoop MapReduce, Pig, Hive, and Oozie. It is a

matter of tiny changes to get the Hadoop family up and working with Cassandra. Third-

party support for Hadoop and Solr has taken Cassandra to the next level in terms of integra-

tion. Third-party proprietary tooling, such as DataStax Enterprise Edition for Cassandra,

makes it easy to work with Hadoop and actually helps text search Cassandra using Solr.

Enterprise Edition also provides support for the Spark project.

Cassandra is a very powerful database engine. We have seen its salient features as a single

software entity. In this chapter, we will see how Cassandra can be used as a data store for

third-party software such as Hadoop MapReduce and Pig.

Search WWH ::

Custom Search

Home