Database Reference
In-Depth Information
16:
if
h
i
= 0
then
17: Find a bit in
i
ith bloom filter which is set to 1, and reset to 0.
18: Set the bit at
h
i
position to 1
19:
end if
20:
end for
21:
else
22: With probability (
s
/
iter
) insert
e
by setting all the bit positions in
H
.
23: If
e
was decided to be inserted then randomly reset one bit positions
from each of the
k
Bloom filters.
24:
end if
25:
end if
26:
iter
←
iter
+ 1
27:
end for
This transition (Event 1) may happen during any of the iterations from (
s
+ 1) to
m.
Hence,
l
∈ [
s
+ 1,
m
].
Since the different Bloom filters are independent, the final
decision (
distinct or duplicate
)
regarding
e
m
+1
is taken after probing all the bit posi-
tions of
H
, and hence,
k
s
m
≈−
ks
m
P
=−
1
1
(13.1)
H
It can be observed that the transition of the bit may also be possible during the first
s
elements of the stream (Event 2). Since the first
s
elements of the stream are always
inserted, all the bit positions in
H
should be set at least once during this period for an
element to be reported as duplicate. Therefore, using similar computations as above,
the probability that all the bits for
e
m
+1
in
H
is set in the initial
s
iterations is given by
k
1
s
m
(13.2)
P
=−
1
H
e
s
Either of the above two events will contribute to the FPR, hence the probability of
e
m
+1
being reported as an FP can be obtained using Equations 13.1 and 13.2, which
is given by
k
m
U
U
−
1
−+−
ks
m
1
s
m
P
=
1
1
(13.3)
FPR
e
The detailed proof can be found in [15]. Analyzing Equation 13.3, it can be seen
that as the stream length
m
tends to infinity, the right multiplicative factor tends
to 1. However, as
U
− 1 <
U
the left multiplicative term tends to 0. Hence as the
stream length increases, the observed FPR decreases and nearly becomes constant.
This leads to a stable performance of RSBF similar to that of SBF. However, RSBF
achieved this convergence much faster as opposed to SBF as discussed in [15].
Search WWH ::
Custom Search