Database Reference
In-Depth Information
17: end for
18: Output Result
19: end for
SBF is then updated by again randomly uniformly selecting p bits locations, at
which the count is decremented by 1, thereby making room for new elements of the
stream. The k positions previously selected are now set to Max = 2 d − 1, where d is
the number of bits at each position of the Bloom filter. The setting of Max value at the
SBF locations enable it to delete older data to incorporate the new arriving elements.
The pseudo-code of SBF is presented in Algorithm 13.1. The details of the working
of SBF can be found in [12].
13.3.1 s tability P roPerty
The stable Bloom filter introduces a unique property called “stability.” The stability
guarantees that the number of zeroes in SBF become constant after a number of ele-
ments arrive on the data stream. Since the false-positive rate depends on the number
of zeroes in the Bloom filters, SBF demonstrates constant FPR once the stability
point is achieved. Ref. [12] proves that at stability, the FPR and FNR of SBF are con-
stant . At this stage, the expected number of zeros in SBF are shown to be constant
for a sufficiently large stream. Interesting, the expected fractions of 0s in SBF is
monotonically nonincreasing and the FPR at stability is bounded by
k
Max
1
,
FPR
=−
1
1
11
1
+
pk
(
/
/
m
)
where m is the number of elements already occurred on the stream S. The above
equation shows a strong correlation between the FPR an the values of k and p.
Similarly, the FNR can be computed and was seen to also exhibit similar proper-
ties. The convergence of SBF to stability was shown to be at an exponential rate on
expectation. For detailed proofs and properties, the readers are referred to [12]. The
stability property provides an excellent domain for de-duplication applications to
limit the FP and FN rates to the desired acceptable tolerance level independent of
the stream length. Ref. [12] considered the underlying data distribution of the input
stream to be constant.
The working of SBF also depends on the appropriate setting of its parameters.
In [12], the parameters have been set so as to minimize the average FN rate, while
bounding the FP rate within acceptable thresholds following the theoretical bounds
obtained. Max can be empirically set to enhance the performance of SBF.
SBF mainly suffers from the limitation that the stability is theoretically reached at
infinite stream length. Also the underlying distribution of the input data is assumed
to be constant in SBF. SBF performs substantially better than most of the other
Bloom filter approaches for duplicate detection, exhibiting low FPR and considerable
Search WWH ::




Custom Search