Database Reference
In-Depth Information
Compression setting is table-wise; if you do not mention any compression mechanism,
LZ4Compressor
is applied to the table by default. This is how you alter compression
type (see details about assigning compression setting when the table is created in
Chapter
ALTER TABLE users
WITH
COMPRESSION = {
'sstable_compression': 'DeflateCompressor'
};
Let's see the compression options we have.
The
sstable_compression
parameter specifies which compressor is used to com-
press disk representation of SSTable, when MemTable is flushed (compression takes place
at the time of flush). Cassandra Version 2.1.0 provides three compressors out of the box:
LZ4Compressor
,
SnappyCompressor
, and
DeflateCompressor
.
The
LZ4Compressor
is 50 percent faster than
SnappyCompressor
, which is faster
than
DeflateCompressor
. In general, this means, when you move from
De-
flateCompressor
to
LZ4Compressor
, the compression will take a little extra
space, but it will have higher read speed.
Like everything else in Cassandra, compressors are pluggable. You can write your own
compressor by implementing
org.apache.cassandra.io.compress.ICompressor
, compiling the com-
pressor, and putting the
.class
or
.jar
files in the
lib
directory. Provide the fully-
qualified class name of the compression as the
sstable_compression
value.
The chunk length (
chunk_length_kb
) is the smallest slice of the row that gets decom-
pressed during reads. Depending on the query pattern and median size of the rows, this
parameter can be tweaked in such a way that it is big enough to not have to deflate mul-
tiple chunks, but small enough to not have to decompress excessive unnecessary data.
Practically, it is hard to guess this. The most common suggestion is to keep it 64 KB, if
you do not have any idea.
Compression can be added, removed, or altered anytime during the lifetime of a table. In
general, compression always boosts performance and it is a great way to maximize the
utilization of disk space. Compression gives double to quadruple reduction in data size