Performance Tuning - Mastering Apache Cassandra

Database Reference

In-Depth Information

Read performance

Reading in Cassandra is rather complicated; it may need to read from the memory or from

the hard drive; it may need to aggregate multiple fragments of data from different SST-

ables; it may require to get data across multiple nodes, take care of tombstones, and valid-

ate digest and get it back to the client. Alternatively, the common pattern of increasing the

read performance in Cassandra is the same as any other data system's caching—to keep the

most frequent data in memory, minimize disk access, and keep search path/hops small on

disk. Also, fast network and fewer communication over the network, and low read consist-

ency level may help.

Choosing the right compaction strategy

With each flush of a MemTable, an immutable SSTable gets created. So, with time, there

will be numerous SSTables, if their numbers are not limited by an external process (for ex-

ample, a process that merges them, deletes unused ones, or compresses them). The main

problem with lots of SSTables is slow read speed. A search may need to hop through mul-

tiple SSTables to fetch the requested data. The compaction process repeatedly executes

merging these SSTables into one larger SSTable, which has a cleaned-up version of the

data that was scattered in fragments into different smaller SSTables, littered with tomb-

stones. This also means that compaction is pretty disk I/O intensive; so the longer and more

frequently it runs, the more contention it will it produce for other Cassandra processes that

require to read from or write to the disc.

Cassandra provides two compaction strategies as of version 2.1.0. The compaction strategy

is a table-level setting; so you can set an appropriate compaction strategy for a table, based

on its behavior.

Size-tiered compaction strategy

The size-tiered compaction strategy is the default strategy. The way it works is as follows:

as soon as the count of equal-sized SSTables reaches to min_threshold (default 4),

they get compacted into one bigger SSTable. As the compacted SSTables get bigger and

bigger, it is rare that large SSTables gets compacted further. This leaves some very large

SSTables and many smaller SSTables. This also means that row updates will be scattered

in multiple SSTables, and will require longer time to process multiple SSTables to get the

fragments of a row.

Search WWH ::

Custom Search

Home