Database Reference
In-Depth Information
int m;
SerializableHasher[] hashes;
ObjectOutputStream[] outputs;
protected void initialize(int m,int[] seeds) throws
IOException {
this.m = m;
bits = new BitSet(m);
hashes = new SerializableHasher[seeds.length];
outputs= new ObjectOutputStream[seeds.length];
for(int i=0;i<seeds.length;i++) {
hashes[i] = (new
SerializableHasher()).seed(seeds[i]);
outputs[i] = new ObjectOutputStream(hashes[i]);
}
bits.clear();
}
public BloomSet(int m,int[] seeds) throws
IOException {
initialize(m,seeds);
}
The SerializableHasher and ObjectOutputStream arrays
implement the 32-bit MurmurHash using Java's serialization mechanism
to make it possible to use this class for any object that implements
Serializable .
In practice, it is common to also implement specialized sets for storing
things like String or Long values. These data types are very common
in streaming analysis, so having a specialized class can result in a large
performance improvement. In fact, the MurmurHash implementation used
in the sample code provided with this topic has specialized versions of the
hash optimized for these use cases.
Adding a new element to the set is easy. First, each hash function is used
to generate k different hash values. These hash values are mapped into the
array of m registers using the modulus operator. The bit at that position in
the register array is then set to 1. A Java implementation is as follows:
public boolean add(E arg0) {
if(!contains(arg0)) {
Search WWH ::




Custom Search