Database Reference
In-Depth Information
application's requests to be served from within the same data center, which will
help avoid quite a bit of latency. It is important to keep in mind that even at a
consistency level of ONE or LOCAL_QUORUM , a write is still sent to all replicas,
including those in other data centers. In this case, the consistency level determines
how many replicas are required to respond that they received the write.
Compression
There are a few options for using compression and taking advantage of what it has
to offer. There is compression at the file system level, compression at the Colum-
nFamily level, and compression at the network level.
Network compression is available for dealing with internode communication.
In the cassandra.yaml file, the option internode_compression controls
whether traffic moving between Cassandra nodes should be compressed. There are
a few options here. You can choose to ignore compression completely, compress
all traffic, or only compress traffic between different data centers. It is likely that
this setting will not have a major effect on your system any way you set it. The
default is to compress all traffic, and this is a sane default. Compression is CPU
bound. If you are short on CPU resources (and it's rare that Cassandra is CPU
bound), not compressing any traffic will likely net a performance bonus. You can
also incrementally save here by just setting it to only compress between data cen-
ters (assuming you have more than one data center).
Prior to Cassandra 1.1.0, compression at the ColumnFamily level was turned
off by default. The option to use either SnappyCompressor or DeflateCompressor
has been around since Cassandra 1.0.0. Cassandra post-1.1.0 has the Java Snappy
compression library as the default compression for a ColumnFamily. Out of the
box, you can get a pretty good speed increase by just enabling some compression
algorithm across all of the ColumnFamilys. In addition to saving space on disk,
compression also reduces actual disk I/O. This is especially true for read-heavy
workloads. Since the data on disk is compressed, Cassandra only needs to find
the location of the rows in the SSTable index and decompress the relevant row
chunks. All this ultimately means that larger data sets can now fit into memory,
which means quicker access times. The speed increase on writes happens because
the data is compressed when the MemTable is flushed to disk. This results in a lot
less I/O. As a negative, it adds a little more CPU overhead to the flush. Typically,
this is negligible compared to the performance gains. With all these factors con-
sidered together, using compression is highly recommended.
Search WWH ::




Custom Search