Database Reference
In-Depth Information
when compared to an uncompressed version. So, one should always set a compression to
start with, and it can be disabled pretty easily as follows:
# Disable Compression
ALTER TABLE users
WITH
COMPRESSION = {
'sstable_compression': ''
};
It must be noted that enabling compression may not immediately halve the space used by
SSTables. The compression is applied to the SSTables that get created after the compres-
sion is enabled. With time, as compaction merges SSTables, older SSTables get com-
pressed.
Tuning the bloom filter
Accessing a disk is the most expensive task. Cassandra thinks twice before needing to
read from a disk. The bloom filter helps to identify which SSTables may contain the row
that the client has requested. Alternatively, the bloom filter being a probabilistic data
structure, yields a false positive ratio (refer to Chapter 2 , Cassandra Architecture ). The
more the false-positives, the more the SSTables needed to be read before realizing wheth-
er the row actually exists in the SSTable or not.
The false-positive ratio is basically the probability of getting a true value from the bloom
filter of an SSTable for a key that does not exist in it. In simpler words, if the false-posit-
ive ratio is 0.5, chances are that 50 percent of the times you end up looking into the index
file for the key but it is not there. So, why not set the false-positive ratio to zero; never
make a disk touch without being 100 percent sure. Well, it comes with a cost—memory
consumption. If you remember from Chapter 2 , Cassandra Architecture , the smaller the
size of the bloom filter, the smaller the memory consumption. A smaller bloom filter in-
creases the likelihood of the collision of hashes, which means a higher false positive. So,
as you decrease the false-positive value, your memory consumption shoots up. Therefore,
we need a balance here.
In the bloom filter, the default value of the false-positive ratio is set to 0.000744. To dis-
able the bloom filter, that is, to allow all the queries to SSTable—all false positive—this
ratio needs to be set to 1.0. One may need to bypass the bloom filter by setting the false-
positive ratio to 1, if one has to scan all SSTables for data mining or other analytical ap-
plications.
Search WWH ::




Custom Search