Database Reference
In-Depth Information
13.5.1.1 False-Positive Rate
A false positive, FP, occurs when a distinct element of the stream is reported as a
duplicate. Consider the FP of
e
m
+1
, the (
m
+ 1)t h
element of the stream. The elements
of the stream are assumed to be uniformly drawn at random from a finite universe
Γ, with |Γ| =
U
.
Let
P
unique
be the probability that
e
m
+1
has not occurred in the first
m
elements of the
stream, and let
e
m
+1
hash to
H =
{
h
1
,
h
2
,…,
h
k
} positions, where
h
i
∈ [1,
s
] for the
i
th
Bloom filter.
e
m+
1
will be reported as a duplicate when all the bit positions in
H
are set
to 1 after the first
m
stream elements. Since all the Bloom filters are identical and are
independently processed, the argument for one of them can be extended for the others.
Assume element
e
l
hashes to position
h
1
in the first Bloom filter. Initially, all the
bits of the Bloom filters are set to 0. Let the latest transition of
h
1
, from 0 to 1, occur
at the
l
lth iteration, and thereafter
h
1
is never reset, that is, set to 0. In RSBF a bit will
not be reset to 0 in an iteration if the stream element for the iteration is not selected
for insertion or a different bit of the Bloom filter is chosen for deletion when the ele-
ment is to be inserted. The probability of such a transition of
h
1
is represented by
P
trans
.
Therefore,
PPe
= (
is inserted
)
P e
(
selects
h
)
Ph
(
is not re
set)
trans
l
l
1
1
m
m
1
pp
s
s
−
1
=
s
ls
1
1
=
1
∏
∏
=
p
(
1
−
)
+
1
−
l
i
i
s
i
m
il
=+
1
il
=+
1
Algorithm 13.2:
RSBF
(
S
)
Require:
Threshold FPR (
FPR
t
), Memory in bits (
M
), and Stream (
S
)
Ensure:
Detecting
duplicate
and
distinct
elements in
S
1: Compute the value of
k
from
FPR
t
.
2: Construct
k
Bloom filters each having
M
/
k
bits of memory.
3:
iter
← 1
4:
for
each element
e
of
S
do
5: Hash
e
into
k
bit positions,
H
=
h
1
, ···,
h
k
.
6:
if
all bit positions in
H
are set
then
7:
Result
←
DUPLICATE
8:
else
9:
Result
←
DISTINCT
10:
end if
11:
if
iter
≤
s
then
12: Set all the bit positions in
H
.
13:
else
14:
if
(
s
/
iter
) ≤
p
*
then
15:
for all
positions
h
i
in
H
do
Search WWH ::
Custom Search