Database Reference
In-Depth Information
Durability and Consistency
Something to always keep in mind when looking at the performance of your ap-
plication is which trade-offs you are willing to make with regard to the durability
of your data on write and the consistency of your data on read. Much of this can be
achieved by setting consistency levels on reads and writes. For reference, a quor-
um is calculated (rounded down to a whole number) as (replication_factor/2) + 1.
We have already covered consistency levels in detail, but the theory behind
when to use which consistency level at what time, known as durability, is also im-
portant. When you are working under a write-heavy workload, it is unlikely that
all the data being written is so important that it needs to be verified as received by
every node in a replica ( QUORUM , LOCAL_QUORUM , EACH_QUORUM , or ALL ).
Unless your node or cluster is under a heavy load, you will probably be safe with
using CL.ANY or CL.ONE for most writes. This reduces the amount of network
traffic and reduces the wait time on the application performing the write (which
is typically a blocking operation to begin with). If you can decide at write time
or connection time which data is important enough to require higher consistency
levels in your writes, you can save quite a bit of round-trip and wait time on your
write calls.
On the read side of things, you can ask yourself a similar question: How im-
portant is the accuracy of the call I am making? Since you are working under even-
tual consistency, it is important to remember that the latest and greatest version
of the data may not always be immediately available on every node. If you are
running queries that require the latest version of the data, you may want to run
the query with QUORUM , LOCAL_QUORUM , EACH_QUORUM , or ALL . It is import-
ant to note when using ALL that the read will fail if one of the replicas does not
respond to the coordinator. If it is acceptable for the data not to have the latest
timestamp, using CL.ONE may be a good option. By default, a read repair will
run in the background to ensure that for whatever query you just ran, all data is
consistent.
If latency is an issue, you should also consider using CL.ONE . If consistency
is more important to you, you can ensure that a read will always reflect the
most recent write by using the following: (nodes_written 1 nodes_read) > replica-
tion_factor.
When thinking about consistency levels in the context of multiple data centers,
it is important to remember the additional latency incurred by needing to wait for
a response from the remote data center or data centers. Ideally, you want all of an
Search WWH ::




Custom Search