Database Reference
In-Depth Information
Cassandra needs two things from you to specify when creating a keyspace; one, the name
of the keyspace, and the other is the replication strategy. Optionally, you can specify IF
EXISTS clause if you are writing a script that adds tables to an existing keyspace and do
not want to error out on the first line that creates the keyspace. You may want to specify
whether you wanted a durable write. Note that switching off the durable write may be a
bad idea. And generally, you would not want to disable durable write. While I agree that
there is some performance gain by disabling durable write as it bypasses the commit log,
it does so at the cost of possible data loss. You may get some performance gain just by
moving commit log to a separate disk by changing the setting in cassandra.yaml .
Here is how you create a keyspace:
CREATE { KEYSPACE | SCHEMA } [IF NOT EXISTS]
The REPLICATION setting takes a map. If you are using cqlsh , you need to type a
JSON object to specify it. This setting is to specify how you want your data to be replic-
ated across the nodes. There are two options to do this: SimpleStrategy and Net-
workTopologyStrategy .
SimpleStrategy
SimpleStrategy is used when you have single data center or you want all nodes to be
treated as they are in a single data center. In this setting, data is placed on one node and its
replica is placed on the consecutive next node when moving clockwise (increasing token
number side). SimpleStrategy is specified as follows:
{ 'class' : 'SimpleStrategy', 'replication_factor' :
<positive_integer> }
Here, <positive_integer> is the number of copies of data you want and it should
be greater than zero.
NetworkTopologyStrategy
NetworkTopologyStrategy , as the name suggests, stores data depending on how
the nodes are placed. Replicas should be stored on nodes that are on different racks in the
data center to avoid a failure in case a rack dies. In this strategy, you can specify how
many replicas you want in a data center if you have your nodes spanning across various
data centers. It may be worth noting that each data center has a full set of data with speci-
fied replica. So, if you choose a DC1 to have the replication factor of 2 and DC2 to have
the replication factor of 3, the whole corpus exists in DC1 and DC2 with DC1 having two
Search WWH ::




Custom Search