Database Reference
In-Depth Information
|A|+|B|-|AB|. Now we can estimate the Jaccard Similarity of any
website's traffic pattern to the botnet profile. You can immediately
blacklist or flag for investigation those websites that are too similar.
Interesting Variations
The Bloom Filter has been around a long time and this has led to a number
of variants of the original algorithm. This section covers two of the more
interesting variations. In particular, these two variations address issues that
tend to arise in streaming situations. In these cases, the basic Bloom Filter
is likely to become saturated over time, making it somewhat useless.
Counting Bloom Filters
One of the more useful variants is the Counting Bloom Filter. This variant
was introduced to overcome the fact that elements cannot be removed from
the basic Bloom Filter implementation.
In this variant, the hashing part of the algorithm is identical to the original
algorithm. It also has the same criteria for selecting the number of registers
and the number of hash functions.
However, the registers in this variant are no longer single bits but counters.
The number of bits assigned to each counter is quite small; 4 bits have been
shown to be adequate for most applications.
To add an element, the counter is simply incremented rather than setting
the bit to 1. To remove an element, the counter for each of the k hash
functions is decremented unless the counter has reached its maximum
value. In this case, the counter is considered to have overflowed and left
permanently at its maximum value.
Allocating a generous 8 bits to each counter, the add(E arg0) method
only needs a simple change to increment the counter when it has not yet
overflowed:
public boolean add(E arg0) {
boolean added = false;
for(int i=0;i<outputs.length;i++) {
hashes[i].reset();
Search WWH ::




Custom Search