Performance Tuning - Cassandra: The Definitive Guide

Database Reference

In-Depth Information

Chapter11.Performance Tuning

In this chapter, we look at how to tune Cassandra to improve performance. A variety of settings

in the configuration file help us do this, and we present a few pointers on hardware selection and

configuration. There are several isolated settings that you can update in Cassandra's configura-

tion file; although the defaults are often appropriate, there might be circumstances in which you

need to change them. In this chapter, we look at several of those settings.

As a general rule, it's important to note that simply adding nodes to a cluster will not improve

performance on its own. You need to replicate the data appropriately, then send traffic to all the

nodes from your clients. If you aren't distributing client requests, the new nodes could just stand

by somewhat idle.

We also see how to use the Python stress test tool that ships with Cassandra to run a reasonable

load against Cassandra and quickly see how it behaves under stress test circumstances. We can

then tune Cassandra appropriately and feel confident that we're ready to launch in a staging en-

vironment.

Data Storage

There are two sets of files that Cassandra writes to as part of handling update operations: the

commit log and the datafile. Their different purposes need to be considered in order to under-

stand how to treat them during configuration.

The commitlogcan be thought of as short-term storage. As Cassandra receives updates, every

write value is written immediately to the commit log in the form of raw sequential file appends.

If you shut down the database or it crashes unexpectedly, the commit log can ensure that data is

not lost. That's because the next time you start the node, the commit log gets replayed. In fact,

that's the only time the commit log is read; clients never read from it. But the normal write op-

eration to the commit log blocks, so it would damage performance to require clients to wait for

the write to finish.

The datailerepresents the Sorted String Tables (SSTables). Unlike the commit log, data is writ-

ten to this file asynchronously. The SSTables are periodically merged during major compactions

to free up space. To do this, Cassandra will merge keys, combine columns, and delete tomb-

stones.

Read operations can refer to the in-memory cache and in this case don't need to go directly to

the datafiles on disk. If you can allow Cassandra a few gigabytes of memory, you can improve

performance dramatically when the row cache and the key cache are hit.

Search WWH ::

Custom Search

Home