Database Reference
In-Depth Information
tools catering to massive data analytics, paved the path for efficient implementa-
tion of parallel Bloom filters and their variants. For a single data element, all the bit
positions of the Bloom filters could be accessed simultaneously, giving enormous
performance gains. The two approaches discussed in this chapter can trivially be
implemented in a parallel system.
13.3 STABLE BLOOM FILTER (SBF)
In data stream applications, as more and more elements arrive, the fraction of zeros
in the Bloom filter decreases continuously and the false-positive rate increases
finally reaching 1. Usual solutions employ random eviction policies for bits from the
Bloom filters to accommodate the new elements and keep the FPR at bay. An inter-
esting Bloom filter structure proposed recently is the stable Bloom filter , SBF [12].
It provides a stable performance guarantee on a very large stream. This constant
performance is of huge importance for de-duplication applications. SBF works by
continuously evicting stale information from the Bloom filters. Although it achieves
a tight upper bound on FPR, the stability of the algorithm is reached theoretically at
infinite stream length.
The regular bits of a Bloom filters are changed to a collection of bits in the Stable
Bloom filter. New elements arriving on the stream are mapped into k positions of the
Bloom filter by k uniform and independent hash functions. The membership query
for the new element proceeds as in regular Bloom filters, wherein the selected loca-
tions are probed.
Algorithm 13.1: SBF ( S )
Require: Stream ( S ) and Number of bits at each Bloom filter position ( d )
Ensure: Detecting duplicate and distinct elements in S
1: SBF is initialized to 0
2: for each e i S do
3 Select k cells of SBF
4: if none of the above cells is 0 then
5: Result ← DUPLICATE
6: else
7: Result ← DISTINCT
8: end if
9: Select p cells of SBF uniformly randomly
10: for each cells selected above do
11: if cell, C i has value ≥ 1 then
12: C i C i − 1
13: end if
14: end for
15: for each of the k cells previously selected do
16: Set cell value to Max = 2 d − 1
Search WWH ::




Custom Search