Database Reference
In-Depth Information
process. That's why it is used in Cassandra; to avoid reading many SSTables, which might
have become a bottleneck.
Note
How a bloom filter works
A bloom filter, in its simplest form, can be assumed as a bit array of length l , with all ele-
ments set to zero. It also has k predefined hash functions associated with it.
The following figure shows the bloom filter in action. It uses three hash functions and sets
the corresponding bit in the array to 1 (it might already be 1).
To add a key to a bloom filter (at the time of entering data in the associated collection), k
hashes are calculated using k predefined hash functions. A modulus of each hash value is
taken using array length l , and the value at this array position is set to 1.
The following pseudo code shows what happens:
//calculate hash, mod it to get location in bit array
arrayIndex1 = md5(v) % arrayLength
arrayIndex2 = sha1(v) % arrayLength
arrayindex3 = murmur(v) % arrayLength
//set all those indexes to 1
bitArray[arrayIndex1] = 1
bitArray[arrayIndex2] = 1
bitArray[arrayIndex3] = 1
To query the existence of a key in the bloom filter, the process is similar. Take the key and
calculate the predefined hash values. Take modulus of the hash values with the length of
the bit array. Look into those locations. If it turns out that at least one of those array loca-
tions have a zero value in them, it is certain that this value was never inserted in this
bloom filter, and hence, does not exist in the associated collection. On the other hand, if
Search WWH ::




Custom Search