Database Reference
In-Depth Information
Chapter11.Performance Tuning
In this chapter, we look at how to tune Cassandra to improve performance. A variety of settings
in the configuration file help us do this, and we present a few pointers on hardware selection and
configuration. There are several isolated settings that you can update in Cassandra's configura-
tion file; although the defaults are often appropriate, there might be circumstances in which you
need to change them. In this chapter, we look at several of those settings.
As a general rule, it's important to note that simply adding nodes to a cluster will not improve
performance on its own. You need to replicate the data appropriately, then send traffic to all the
nodes from your clients. If you aren't distributing client requests, the new nodes could just stand
by somewhat idle.
We also see how to use the Python stress test tool that ships with Cassandra to run a reasonable
load against Cassandra and quickly see how it behaves under stress test circumstances. We can
then tune Cassandra appropriately and feel confident that we're ready to launch in a staging en-
vironment.
Data Storage
There are two sets of files that Cassandra writes to as part of handling update operations: the
commit log and the datafile. Their different purposes need to be considered in order to under-
stand how to treat them during configuration.
The commitlogcan be thought of as short-term storage. As Cassandra receives updates, every
write value is written immediately to the commit log in the form of raw sequential file appends.
If you shut down the database or it crashes unexpectedly, the commit log can ensure that data is
not lost. That's because the next time you start the node, the commit log gets replayed. In fact,
that's the only time the commit log is read; clients never read from it. But the normal write op-
eration to the commit log blocks, so it would damage performance to require clients to wait for
the write to finish.
The datailerepresents the Sorted String Tables (SSTables). Unlike the commit log, data is writ-
ten to this file asynchronously. The SSTables are periodically merged during major compactions
to free up space. To do this, Cassandra will merge keys, combine columns, and delete tomb-
stones.
Read operations can refer to the in-memory cache and in this case don't need to go directly to
the datafiles on disk. If you can allow Cassandra a few gigabytes of memory, you can improve
performance dramatically when the row cache and the key cache are hit.
Search WWH ::




Custom Search