Database Reference
In-Depth Information
|Data set| = 1 × 10 9 , Memory = 256 KB, FPR = 0.1
0.4
RSBF
SBF
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Size of stream (×10 6 )
FIGURE 13.8
FNR stability rate: synthetic data set.
This exactly validates Equation 13.15 that the number of 1s in RSBF becomes nearly
constant much ahead of SBF.
Figure 13.8 shows the faster convergence to stability for RSBF with increase in
stream length. The increase of FNR in RSBF is around 0.1% over a stream length of
0.35 M elements, having an average deviation of 0.3 × 10 −6 per element. On the other
hand, SBF demonstrates an increase in FNR of around 0.3% over 0.3 M element with
the average deviation as 1 × 10 −6 .
13.6.4 D etaileD a nalysis
This section presents the detailed analysis of the algorithms, RSBF and SBF, com-
pared against variation of memory used and percentage of distinct elements in the
stream.
Table 13.2 presents the FNR and FPR with 100 K records and 76% distinct records
while varying memory used for the underlying Bloom filter data structure from 16 K
bits to 4.2 M bits. Here, both the FNR and FPR of RSBF and SBF are close to each
other for different values of memory used. This is due to the fact that the stream size
is quite small and neither of the structures have reached their stability point. Table
13.3 presents the FNR and FPR with 10 M records and 49% distinct records with
varying memory from 16 K bits to 4.2 M bits. Here again, comparable results for
both FNR and FPR in RSBF and SBF are observed as the number of duplicates and
TABLE 13.2
Data Set of 100 K Elements (76% Distinct)
Space (in bits)
SBF% FNR
RSBF% FNR
SBF% FPR
RSBF% FPR
16,384
85.06
84.49
10.05
11.22
65,536
74.37
74.85
8.093
8.384
4,194,304
5.51
6.29
0.00382
0.00263
 
Search WWH ::




Custom Search