Cassandra Performance Tuning - Beginning Apache Cassandra Development

Database Reference

In-Depth Information

row_cache_save_period: 3600

#100 number of keys from row cache to be saved.

row_cache_keys_to_save: 100

#saved cache directory.

saved_caches_directory: /var/lib/cassandra/saved_caches

Bloom Filters

A Bloom filter is a data structure to manage whether an element is present in a set or

not. It was conceived by Burton Bloom in 1970 to find out the probability of whether

an element exists in a set or not. With a Bloom filter enabled it will return whether an

element is “definitely not in set” or “may exist in a set.” A false positive (FP) means

the element may exist in a set when it doesn't, and a false negative means an element

definitely is not present in a set when it is.

Each table in Cassandra contains a Bloom filter. With a false positive chance ratio

value, it checks for columns in a row within the sstable that may exist but for which a

false negative is definitely not possible. Here false negative means that columns of the

row exist but the Bloom filter returns negative.

Setting the Bloom filter FP ratio higher would mean less memory consumption and

ensure that false negatives would never occur (e.g., No disk i/o for non-existing keys)

Range of Bloom filter FP ratio is .000744 to 1.0. The result of setting a false positive

ratio chance to a higher level is that there is the possibility finding a column in the

sstable, but there are no disk reads for negative scenarios.

You can set Bloom filter while creating column family or can also update the

column family like this:

create table tweets(tweet_id text primary key,body text)

with caching='rows_only' and

bloom_filter_fp_chance=0.004;

alter table user with bloom_filter_fp_chance=0.004;

Off-Heap vs. On-Heap

Heap offloading or off-heap is directly allocating memory from the operating system,

whereas on-heap memory objects are managed by the JVM itself. Cassandra also loads

Search WWH ::

Custom Search

Home