Database Reference
In-Depth Information
13.5.1.1 False-Positive Rate
A false positive, FP, occurs when a distinct element of the stream is reported as a
duplicate. Consider the FP of e m +1 , the ( m + 1)t h element of the stream. The elements
of the stream are assumed to be uniformly drawn at random from a finite universe
Γ, with |Γ| = U .
Let P unique be the probability that e m +1 has not occurred in the first m elements of the
stream, and let e m +1 hash to H = { h 1 , h 2 ,…, h k } positions, where h i ∈ [1, s ] for the i th
Bloom filter. e m+ 1 will be reported as a duplicate when all the bit positions in H are set
to 1 after the first m stream elements. Since all the Bloom filters are identical and are
independently processed, the argument for one of them can be extended for the others.
Assume element e l hashes to position h 1 in the first Bloom filter. Initially, all the
bits of the Bloom filters are set to 0. Let the latest transition of h 1 , from 0 to 1, occur
at the l lth iteration, and thereafter h 1 is never reset, that is, set to 0. In RSBF a bit will
not be reset to 0 in an iteration if the stream element for the iteration is not selected
for insertion or a different bit of the Bloom filter is chosen for deletion when the ele-
ment is to be inserted. The probability of such a transition of h 1 is represented by
P trans . Therefore,
PPe
= (
is inserted
)
P e
(
selects
h
)
Ph
(
is not re
set)
trans
l
l
1
1
m
m
1
pp s
s
1
=
s
ls
1
1
=
1
=
p
(
1
)
+
1
l
i
i
s
i
m
il
=+
1
il
=+
1
Algorithm 13.2: RSBF ( S )
Require: Threshold FPR ( FPR t ), Memory in bits ( M ), and Stream ( S )
Ensure: Detecting duplicate and distinct elements in S
1: Compute the value of k from FPR t .
2: Construct k Bloom filters each having M / k bits of memory.
3: iter ← 1
4: for each element e of S do
5: Hash e into k bit positions, H = h 1 , ···, h k .
6: if all bit positions in H are set then
7: Result DUPLICATE
8: else
9: Result DISTINCT
10: end if
11: if iter s then
12: Set all the bit positions in H .
13: else
14: if ( s / iter ) ≤ p * then
15: for all positions h i in H do
Search WWH ::




Custom Search