Database Reference
In-Depth Information
|Data set| = 1 × 10 9 , Memory = 128 MB, reshold FPR = 0.1
0.5
0.4
0.3
0.2
0.1
0
RSBF
SBF
|Data set| = 1 × 10 9 , Memory = 512 MB, reshold FPR = 0.1
0.2
0.16
0.12
0.08
0.04
0
RSBF
SBF
0
200
400
600
800
1000
Size of stream (×10 6 )
FIGURE 13.5
FNR comparison: synthetic data set.
than SBF with a stable FNR of 13%. Thus, RSBF consistently demonstrates better
FNR than SBF upto to a factor of 1.86×, for different data sets.
The significant reduction in FNR obtained by RSBF is novel with respect to stable
Bloom filters and extremely vital for practical applications such as search engines.
This performance of RSBF can be attributed to the forced insertion of a stream
element into the reservoir when the insert probability for the system falls below
the threshold p * as described earlier. This approach eliminates the possibility of an
FNR occurring due to repeated rejection of an element from being inserted into the
reservoir given the lone operation of reservoir sampling. It can also be observed that
essentially it helps RSBF to adapt its reservoir in dynamic streaming environments.
Hence, it partially acts as a simple bias function for RSBF. SBF, on the other hand
fails to meet such demands.
Ref. [12] proposes SBF having a unique feature, stability of the number of 1s
present in the Bloom filter leading to a stable performance of SBF in terms of FPR
and FNR. This stability poses an attractive feature for applications for guaranteeing
a constant performance with increasing stream lengths. However, SBF converges to
its stable point at a theoretical stream length of infinity. Practically, this represents a
very large input stream. RSBF also exhibits such stability but converges to a stable
performance at a much earlier point. This enables applications to guarantee effi-
ciency at a much smaller stream length.
Figure 13.6 compares the difference in the number of 1s for successive number
of records, in the underlying Bloom filter data structures. By studying the varia-
tion in the difference in number of 1s with increasing number of records, one gets
insights into the convergence behavior of the two algorithm. Here again, the total
number of records is around 3 M and FPR threshold used is 0.1. For 2 KB memory,
RSBF stabilizes quickly as the difference in the number of 1s stabilizes to nearly
0 at only 500 K records. However, SBF does not stabilize even at 3 M records. For
4 KB memory, RSBF observes stability at around 1.5 M records, but SBF fails to
Search WWH ::




Custom Search