Database Reference
In-Depth Information
13.6.3 Q uality C omParison
The variation of FNR and FPR along with convergence rates with increasing number
of records in the input stream are now represented. The memory used for the under-
lying Bloom filter data structure is kept constant for both SBF and RSBF in these
experiments.
Figure 13.2 presents the comparison of FPR for real data set with more than 3 M
records. Initially, until the number of input stream records reaches the threshold,
RSBF has better FPR (0.001) than SBF (around 0.0025). RSBF in this stage accepts
all the input records in its reservoir and the available memory determines the thresh-
old count. It can also be observed that upto the threshold point RSBF will not incur
any FNR.
As the number of records increase, the FPR performance of RSBF gradually
becomes comparable to that of SBF. It can be noted here that, even with a small
memory of 2 KB for around 3 M elements, the FPR achieved is quite low, 0.0025.
This demonstrates that both RSBF and SBF attain low FPR for large number of
records with a significantly small memory space.
Figure 13.3 presents the comparison of FPR for the synthetic data set with 1B
records. With 128-MB memory, as the number of records increases, the FPR for
RSBF stabilizes at 0.8%, while that for SBF stabilizes around 0.7%. With larger
memory, 512-MB memory, as the number of records increases, the FPR for both
RSBF and SBF stabilizes at around 0.06%. Thus, both RSBF and SBF attain com-
parable FPR for massive number of records, with the performance becoming nearly
equal at larger memory. The use of reservoir sampling in RSBF enables it in general
to sieve out duplicates that occur in higher probability as the stream length increases,
given the finite size of alphabet set of the input elements.
Figure 13.4 compares the FNR between RSBF and SBF with increase in the num-
ber of records. For around 3 M records and FPR threshold of 0.1, both RSBF and SBF
show an initially increases in FNR, stabilizing as the number of records increases
further. However, for both 2 KB and 4 KB memory, RSBF clearly outperforms SBF
|Data set| = 3,367,020, Memory = 2 KB, reshold FPR = 0.1
0.004
RSBF
SBF
0.0035
0.003
0.0025
0.002
0.0015
0.001
0.0005
0 0
500
1000 1500
2000
2500
3000
Size of stream (×10 3 )
FIGURE 13.2
FPR comparison.
Search WWH ::




Custom Search