Building high-availability solutions with NoSQL - Making Sense of NoSQL

Databases Reference

In-Depth Information

Table 8.2

(continued)

Level

Write guarantee

EACH_QUORUM

Ensure that the write has been written to <ReplicationFactor> / 2 + 1

nodes in each data center (requires NetworkTopologyStrategy ).

ALL (strong consistency)

All replicas must confirm that the data was written to disk.

Next, you consider what to do if one of the nodes is unavailable during a read transac-

tion. How can you specify the number of nodes to check before you return a new

value? Checking only one node will return a value quickly, but it may be out of date.

Checking multiple nodes may take a few milliseconds longer, but will guarantee you

get the latest version in the cluster. The answer is to allow the client reader to specify

a consistency level code similar to the write codes discussed here. Cassandra clients

can select from codes of ONE , TWO , THREE , QUORUM , LOCAL_QUORUM , EACH

_QUORUM , and ALL when doing reads. You can even use the EACH_QUORUM code to

check multiple data centers around the world before the client returns a value.

As you'll see next, Cassandra uses specific configuration terms that you should

understand before you set up and configure your cluster.

8.4.1

Configuring data to node mappings with Cassandra

In our discussion of consistent hashing, we introduced the concept of using a hash to

evenly distribute data around a cluster. Cassandra uses this same concept of creating a

hash to evenly distribute their data. Before we dive into how Cassandra does this, let's

take a look at some key terms and definitions found in the Cassandra system.

R OWKEY

A rowkey is a row identifier that's hashed and used to place the data item on one or

more nodes. The rowkey is the only structure used to place data onto a node. No col-

umn values are used to place data on nodes. Designing your rowkey structure is a crit-

ical step to making sure similar items are clustered together for fast access.

P ARTITIONER

A partitioner is the strategy that determines how to assign a row to a node based on its

key. The default setting is to select a random node. Cassandra uses an MD5 hash of the

key to generate a consistent hash. This has the effect of randomly distributing rows

evenly over all the nodes. The other option is to use the actual bytes in a rowkey (not a

hash of the key) to place the row on a specific node.

K EYSPACE

A keyspace is the data structure that determines how a key is replicated over multiple

nodes. By default, replication might be set to 3 for any data that needs a high degree

of availability.

Search WWH ::

Custom Search

Home