Database Reference
In-Depth Information
Read performance
Reading in Cassandra is rather complicated; it may need to read from the memory or from
the hard drive; it may need to aggregate multiple fragments of data from different SST-
ables; it may require to get data across multiple nodes, take care of tombstones, and valid-
ate digest and get it back to the client. Alternatively, the common pattern of increasing the
read performance in Cassandra is the same as any other data system's caching—to keep the
most frequent data in memory, minimize disk access, and keep search path/hops small on
disk. Also, fast network and fewer communication over the network, and low read consist-
ency level may help.
Choosing the right compaction strategy
With each flush of a MemTable, an immutable SSTable gets created. So, with time, there
will be numerous SSTables, if their numbers are not limited by an external process (for ex-
ample, a process that merges them, deletes unused ones, or compresses them). The main
problem with lots of SSTables is slow read speed. A search may need to hop through mul-
tiple SSTables to fetch the requested data. The compaction process repeatedly executes
merging these SSTables into one larger SSTable, which has a cleaned-up version of the
data that was scattered in fragments into different smaller SSTables, littered with tomb-
stones. This also means that compaction is pretty disk I/O intensive; so the longer and more
frequently it runs, the more contention it will it produce for other Cassandra processes that
require to read from or write to the disc.
Cassandra provides two compaction strategies as of version 2.1.0. The compaction strategy
is a table-level setting; so you can set an appropriate compaction strategy for a table, based
on its behavior.
Size-tiered compaction strategy
The size-tiered compaction strategy is the default strategy. The way it works is as follows:
as soon as the count of equal-sized SSTables reaches to min_threshold (default 4),
they get compacted into one bigger SSTable. As the compacted SSTables get bigger and
bigger, it is rare that large SSTables gets compacted further. This leaves some very large
SSTables and many smaller SSTables. This also means that row updates will be scattered
in multiple SSTables, and will require longer time to process multiple SSTables to get the
fragments of a row.
Search WWH ::




Custom Search