Database Reference
In-Depth Information
or the index file, we can get some help from the sampled index in memory. Looking
through the sampled index, Cassandra finds out that there exists a row key 400 and anoth-
er, 624. So, the row fragments may be in this SSTable. But more importantly, the sampled
index tells the offset about the 400 entry in the index file. Cassandra now scans the SST-
able from 400 and gets to the entry for 404. This tells Cassandra the offset of the entry for
the 404 key in SSTable and it reads from there. The following figure shows the Cassandra
SSTable index in action:
If you followed the example, you must have observed that the smaller the sampling size,
the more the number of keys in the memory; the smaller the size of the block to read on
the disk, the faster the results. This is a trade-off between memory usage and performance.
Data files
Data files are the actual data. They contain row keys, metadata, and columns (partial or
full). Reading data from the data files is just one disk seek, followed by a sequential read,
as the offset to a row key is already obtained from the associated index file.
Compaction
As we discussed earlier in the Read in action section, a read require may require Cas-
sandra to read across multiple SSTables to get a result. This is wasteful, costs multiple
(disk) seeks, may require a conflict resolution, and if there are too many SSTables, it may
slow down the read. To handle this problem, Cassandra has a process in place, namely
compaction. Compaction merges multiple SSTable files into one. Off the shelf, Cassandra
offers two types of compaction mechanisms: size-tiered compaction strategy and level
compaction strategy (refer to the Read performance section in Chapter 5 , Performance
Tuning ). This section stays focused on a size-tiered compaction mechanism for better un-
derstanding.
The compaction process starts when the number of SSTables on disk reaches a certain
threshold (configurable). Although the merge process is a little I/O intensive, it benefits in
the long term with a lower number of disk seeks during reads. Apart from this, there are a
few other benefits of compaction, as follows:
Search WWH ::




Custom Search