Database Reference
In-Depth Information
tools catering to massive data analytics, paved the path for efficient implementa-
tion of parallel Bloom filters and their variants. For a single data element, all the bit
positions of the Bloom filters could be accessed simultaneously, giving enormous
performance gains. The two approaches discussed in this chapter can trivially be
implemented in a parallel system.
13.3 STABLE BLOOM FILTER (SBF)
In data stream applications, as more and more elements arrive, the fraction of zeros
in the Bloom filter decreases continuously and the false-positive rate increases
finally reaching 1. Usual solutions employ random eviction policies for bits from the
Bloom filters to accommodate the new elements and keep the FPR at bay. An inter-
esting Bloom filter structure proposed recently is the
stable Bloom filter
, SBF [12].
It provides a stable performance guarantee on a very large stream. This constant
performance is of huge importance for de-duplication applications. SBF works by
continuously evicting stale information from the Bloom filters. Although it achieves
a tight upper bound on FPR, the stability of the algorithm is reached theoretically at
infinite stream length.
The regular bits of a Bloom filters are changed to a collection of bits in the Stable
Bloom filter. New elements arriving on the stream are mapped into
k
positions of the
Bloom filter by
k
uniform and independent hash functions. The membership query
for the new element proceeds as in regular Bloom filters, wherein the selected loca-
tions are probed.
Algorithm 13.1:
SBF
(
S
)
Require:
Stream (
S
) and Number of bits at each Bloom filter position (
d
)
Ensure:
Detecting
duplicate
and
distinct
elements in
S
1: SBF is initialized to 0
2:
for
each
e
i
∈
S
do
3 Select
k
cells of SBF
4:
if
none of the above cells is 0
then
5: Result ← DUPLICATE
6:
else
7: Result ← DISTINCT
8:
end if
9: Select
p
cells of SBF uniformly randomly
10:
for
each cells selected above
do
11:
if
cell,
C
i
has value ≥ 1
then
12:
C
i
←
C
i
− 1
13:
end if
14:
end for
15:
for
each of the
k
cells previously selected
do
16: Set cell value to
Max
= 2
d
− 1
Search WWH ::
Custom Search