Database Reference
In-Depth Information
• Removal of expired tombstones (Cassandra v0.8+)
• Merging row fragments
• Rebuilds primary and secondary indexes
Merge is not as painful as it may seem because SSTables are already sorted. (Remember
merge-sort?) Merge results into larger files, but old files are not deleted immediately. For
example, let's say you have a compaction threshold set to four. Cassandra initially creates
SSTables of the same size as MemTable. When the number of SSTables surpasses the
threshold, the compaction thread triggers. This compacts the four equal-sized SSTables in-
to one. Temporarily, you will have two times the total SSTable data on your disk. Another
thing to note is that SSTables that get merged have the same size. So, when the four SST-
ables get merged to give a larger SSTable of size, say G, the buckets for the rest of the to-
be-filled SSTables will be G each. So, the next compaction will take an even larger space
while merging.
The SSTables, after merging, are marked as deletable. They get deleted at a garbage col-
lection cycle of the JVM, or when Cassandra restarts.
The compaction process happens on each node and does not affect other nodes. This is
called minor compaction. This is automatically triggered, system controlled, and regular.
There is more than one type of compaction setting that exists in Cassandra. Another
league of compaction is called, obviously, major compaction .
What's a major compaction? A major compaction takes all the SSTables, and merges them
into one single SSTable. It is somewhat confusing when you see that a minor compaction
merges SSTables and a major one does it too. There is a slight difference. For example, if
we take the size-tiered compaction strategy, it merges the tables of the same size. So, if
your threshold is four, Cassandra will start to merge when it finds four same sized SST-
ables. If your system starts with four SSTables of size X, after the compaction you will
end up with one SSTable of size 4X. Next time when you have four X-sized SSTables,
you will end up with two 4X tables, and so on. (These larger SSTables will get merged
after 16 X-sized SSTables get merged into four 4X tables.) After a really long time, you
will end up with a couple of really big SSTables, a handful of large SSTables, and many
smaller SSTables. This is a result of continuous minor compaction. So, you may need to
hop a couple of SSTables to get data for a query. Then, you run a major compaction and
all the big and small SSTables get merged into one. This is the only benefit of major com-
paction.
Search WWH ::




Custom Search