Effective CQL - Mastering Apache Cassandra

Database Reference

In-Depth Information

Cassandra needs two things from you to specify when creating a keyspace; one, the name

of the keyspace, and the other is the replication strategy. Optionally, you can specify IF

EXISTS clause if you are writing a script that adds tables to an existing keyspace and do

not want to error out on the first line that creates the keyspace. You may want to specify

whether you wanted a durable write. Note that switching off the durable write may be a

bad idea. And generally, you would not want to disable durable write. While I agree that

there is some performance gain by disabling durable write as it bypasses the commit log,

it does so at the cost of possible data loss. You may get some performance gain just by

moving commit log to a separate disk by changing the setting in cassandra.yaml .

Here is how you create a keyspace:

CREATE { KEYSPACE | SCHEMA } [IF NOT EXISTS]

The REPLICATION setting takes a map. If you are using cqlsh , you need to type a

JSON object to specify it. This setting is to specify how you want your data to be replic-

ated across the nodes. There are two options to do this: SimpleStrategy and Net-

workTopologyStrategy .

SimpleStrategy

SimpleStrategy is used when you have single data center or you want all nodes to be

treated as they are in a single data center. In this setting, data is placed on one node and its

replica is placed on the consecutive next node when moving clockwise (increasing token

number side). SimpleStrategy is specified as follows:

{ 'class' : 'SimpleStrategy', 'replication_factor' :

<positive_integer> }

Here, <positive_integer> is the number of copies of data you want and it should

be greater than zero.

NetworkTopologyStrategy

NetworkTopologyStrategy , as the name suggests, stores data depending on how

the nodes are placed. Replicas should be stored on nodes that are on different racks in the

data center to avoid a failure in case a rack dies. In this strategy, you can specify how

many replicas you want in a data center if you have your nodes spanning across various

data centers. It may be worth noting that each data center has a full set of data with speci-

fied replica. So, if you choose a DC1 to have the replication factor of 2 and DC2 to have

the replication factor of 3, the whole corpus exists in DC1 and DC2 with DC1 having two

Search WWH ::

Custom Search

Home