Database Reference
In-Depth Information
Hadoop in a Cassandra cluster
The production version of the Hadoop and Cassandra combination needs to go into a separ-
ate cluster. The first obvious issue is you probably wouldn't want Hadoop to keep polling
Cassandra nodes, hampering Cassandra's performance to end users. The general pattern to
avoid this is to split the ring into two data centers. Since Cassandra automatically and im-
mediately replicates the changes between data centers, they will always be in sync. What's
more, you can assign one of the data centers as transactional with a higher replication
factor and the other as an analytical data center with a replication factor 1. The analytical
data center is the one used by Hadoop without affecting the transactional data center.
Now, you do not really need to have two physically separated data centers to make this
configuration work. Remember NetworkTopologyStrategy ? (Refer to Chapter 3 ,
Effective CQL .) You can tweak Cassandra thinking there are two data centers by just as-
signing the nodes that you wanted to use for analytics in a different data center. You may
need to use PropertyFileSnitch and specify the details about data centers in a
cassandra-toplogy.properties file. So, your keyspace creation looks something
like this:
createkeyspacemyKeyspace
withplacement_strategy = 'NetworkTopologyStrategy'
andstrategy_options = {TX_DC : 2, HDP_DC: 1};
The previous statement defines two data centers, TX_DC for transactional purposes and
HDP_DC for analytics in Hadoop. A node in a transactional data center has a snitch con-
figured like this:
# Transaction Data Center
192.168.1.1=TX_DC:RAC1
192.168.1.2=TX_DC:RAC1
192.168.2.1=TX_DC:RAC2
# Analytics Data Center
192.168.1.3=HDP_DC:RAC1
192.168.2.2=HDP_DC:RAC2
192.168.2.3=HDP_DC:RAC2
Search WWH ::




Custom Search