Databases Reference
In-Depth Information
Table 8.2
(continued)
Level
Write guarantee
EACH_QUORUM
Ensure that the write has been written to <ReplicationFactor> / 2 + 1
nodes in each data center (requires NetworkTopologyStrategy ).
ALL (strong consistency)
All replicas must confirm that the data was written to disk.
Next, you consider what to do if one of the nodes is unavailable during a read transac-
tion. How can you specify the number of nodes to check before you return a new
value? Checking only one node will return a value quickly, but it may be out of date.
Checking multiple nodes may take a few milliseconds longer, but will guarantee you
get the latest version in the cluster. The answer is to allow the client reader to specify
a consistency level code similar to the write codes discussed here. Cassandra clients
can select from codes of ONE , TWO , THREE , QUORUM , LOCAL_QUORUM , EACH
_QUORUM , and ALL when doing reads. You can even use the EACH_QUORUM code to
check multiple data centers around the world before the client returns a value.
As you'll see next, Cassandra uses specific configuration terms that you should
understand before you set up and configure your cluster.
8.4.1
Configuring data to node mappings with Cassandra
In our discussion of consistent hashing, we introduced the concept of using a hash to
evenly distribute data around a cluster. Cassandra uses this same concept of creating a
hash to evenly distribute their data. Before we dive into how Cassandra does this, let's
take a look at some key terms and definitions found in the Cassandra system.
R OWKEY
A rowkey is a row identifier that's hashed and used to place the data item on one or
more nodes. The rowkey is the only structure used to place data onto a node. No col-
umn values are used to place data on nodes. Designing your rowkey structure is a crit-
ical step to making sure similar items are clustered together for fast access.
P ARTITIONER
A partitioner is the strategy that determines how to assign a row to a node based on its
key. The default setting is to select a random node. Cassandra uses an MD5 hash of the
key to generate a consistent hash. This has the effect of randomly distributing rows
evenly over all the nodes. The other option is to use the actual bytes in a rowkey (not a
hash of the key) to place the row on a specific node.
K EYSPACE
A keyspace is the data structure that determines how a key is replicated over multiple
nodes. By default, replication might be set to 3 for any data that needs a high degree
of availability.
 
Search WWH ::




Custom Search