Database Reference
In-Depth Information
You can increase overall performance by reducing the priority of compaction threads. To do so,
use the following flag:
-Dcassandra.compaction.priority=1
This will affect CPU usage, not IO.
Bloom Filters
Bloom filters are used as a performance booster. They are named for their inventor, Burton
Bloom. Bloom filters are very fast, nondeterministic algorithms for testing whether an element
is a member of a set. They are nondeterministic because it is possible to get a false-positive read
from a Bloom filter, but not a false-negative. Bloom filters work by mapping the values in a data
set into a bit array and condensing a larger data set into a digest string. The digest, by definition,
uses a much smaller amount of memory than the original data would. The filters are stored in
memory and are used to improve performance by reducing disk access on key lookups. Disk ac-
cess is typically much slower than memory access. So, in a way, a Bloom filter is a special kind
of cache. When a query is performed, the Bloom filter is checked first before accessing disk. Be-
cause false-negatives are not possible, if the filter indicates that the element does not exist in the
set, it certainly doesn't; but if the filter thinks that the element is in the set, the disk is accessed
to make sure.
A new JMX MBean feature will be added to Nodetool that allows you to check the number of
false-positives that your Bloom filters are returning; this operation is called getBloomFilter-
FalsePositives .
NOTE
Apache Hadoop, Google Bigtable, and Squid Proxy Cache also employ Bloom filters.
Tombstones
In the relational world, you might be used to the idea of a “soft delete.” Instead of actually ex-
ecuting a delete SQL statement, the application will issue an update statement that changes a
value in a column called something like “deleted”. Programmers sometimes do this to support
audit trails, for example.
There's a similar concept in Cassandra called a tombstone. This is how all deletes work and is
therefore automatically handled for you. When you execute a delete operation, the data is not
immediately deleted. Instead, it's treated as an update operation that places a tombstone on the
Search WWH ::




Custom Search