Performance Tuning - Practical Cassandra

Database Reference

In-Depth Information

application's requests to be served from within the same data center, which will

help avoid quite a bit of latency. It is important to keep in mind that even at a

consistency level of ONE or LOCAL_QUORUM , a write is still sent to all replicas,

including those in other data centers. In this case, the consistency level determines

how many replicas are required to respond that they received the write.

Compression

There are a few options for using compression and taking advantage of what it has

to offer. There is compression at the file system level, compression at the Colum-

nFamily level, and compression at the network level.

Network compression is available for dealing with internode communication.

In the cassandra.yaml file, the option internode_compression controls

whether traffic moving between Cassandra nodes should be compressed. There are

a few options here. You can choose to ignore compression completely, compress

all traffic, or only compress traffic between different data centers. It is likely that

this setting will not have a major effect on your system any way you set it. The

default is to compress all traffic, and this is a sane default. Compression is CPU

bound. If you are short on CPU resources (and it's rare that Cassandra is CPU

bound), not compressing any traffic will likely net a performance bonus. You can

also incrementally save here by just setting it to only compress between data cen-

ters (assuming you have more than one data center).

Prior to Cassandra 1.1.0, compression at the ColumnFamily level was turned

off by default. The option to use either SnappyCompressor or DeflateCompressor

has been around since Cassandra 1.0.0. Cassandra post-1.1.0 has the Java Snappy

compression library as the default compression for a ColumnFamily. Out of the

box, you can get a pretty good speed increase by just enabling some compression

algorithm across all of the ColumnFamilys. In addition to saving space on disk,

compression also reduces actual disk I/O. This is especially true for read-heavy

workloads. Since the data on disk is compressed, Cassandra only needs to find

the location of the rows in the SSTable index and decompress the relevant row

chunks. All this ultimately means that larger data sets can now fit into memory,

which means quicker access times. The speed increase on writes happens because

the data is compressed when the MemTable is flushed to disk. This results in a lot

less I/O. As a negative, it adds a little more CPU overhead to the flush. Typically,

this is negligible compared to the performance gains. With all these factors con-

sidered together, using compression is highly recommended.

Search WWH ::

Custom Search

Home