Database Reference
In-Depth Information
Data replication in Cassandra
So far, we've developed a model of distribution in which the total data set is distributed
among multiple machines, but any given piece of data lives on only one machine. This
model carries a big advantage over a single-node configuration, which is that it's horizont-
ally scalable. By distributing data over multiple machines, we can accommodate ever-lar-
ger data sets simply by adding more machines to our cluster.
But our current model doesn't solve the problem of fault-tolerance. No hardware is perfect;
any production deployment must acknowledge that a machine might fail. Our current mod-
el isn't resilient to such failures: for instance, if Node 1 in our original three-node cluster
were to suddenly catch fire, we would lose all the data on that node, including the row con-
taining alice 's user record.
To solve this problem, Cassandra provides replication; in fact, no serious Cassandra de-
ployment would store only one copy of a given piece of data. The number of copies of data
stored is called the replication factor , and it's configured on a per-keyspace level. Recall
the query that we used to create our my_status keyspace in Chapter 1 , Getting Up and
Running with Cassandra :
CREATE KEYSPACE "my_status"
WITH REPLICATION = {
'class': 'SimpleStrategy',
'replication_factor': 1
};
For our development environment, we chose a replication factor of 1; there is little reason
to store multiple copies of the data, since we're only using a single node for development.
In a production deployment, however, we would choose a higher number; 3 is a good de-
fault.
Search WWH ::




Custom Search