Databases Reference
In-Depth Information
return true;
}
protected int[] getHashIndexes(E obj) { ... }
}
To implement getHashIndexes() such that it works truly as k independent hash func-
tions is nontrivial. Instead, in our Bloom filter implementation in listing 5.4, we use a
hack to generate k indexes that are roughly independent and uniformly distributed.
The getHashIndexes() method seeds the Java Random number generator with an
MD5 hash of the object and then takes k “random” numbers as indexes. The Bloom fil-
ter class would benefit from a more rigorous implementation of getHashIndexes() ,
but our hack suffices for illustration purposes.
An ingenious way of creating a Bloom filter for the union of two sets is by OR'ing
the (bit array of the) Bloom filters of each individual set. As adding an object is setting
certain bits in a bit array to 1, it's easy to see why this union rule is true:
public void union(BloomFilter<E> other) {
bf.or(other.bf);
}
We'll be exploiting this union trick to build Bloom filters in a distributed fashion. Each
mapper will build a Bloom filter based on its own data split. We'll send the Bloom fil-
ters to a single reducer, which will take a union of them and record the final output.
As the Bloom filter will be shuffled around as the mappers' output, the BloomFilter
class will have to implement the Writable interface, which consists of methods
write() and readFields() . For our purpose these methods transform between the
internal BitSet representation and a byte array such that the data can be serialized to
DataInput / DataOutput . The final code is in listing 5.4.
Listing 5.4 Basic Bloom filter implementation
class BloomFilter<E> implements Writable {
private BitSet bf;
private int bitArraySize = 100000000;
private int numHashFunc = 6;
public BloomFilter() {
bf = new BitSet(bitArraySize);
}
public void add(E obj) {
int[] indexes = getHashIndexes(obj);
for (int index : indexes) {
bf.set(index);
}
}
public boolean contains(E obj) {
int[] indexes = getHashIndexes(obj);
for (int index : indexes) {
 
 
Search WWH ::




Custom Search