Database Reference
In-Depth Information
13.5.1.2 False-Negative Rate
A false-negative error occurs in the stream when a duplicate element is recognized
as distinct. This section focuses on determining the probability of occurrence of an
FN. As per the working of RSBF (Algorithm 13.2), an element e will be an FN if it
has occurred in the stream earlier and one of the following two cases hold:
1. At least one of the k bits of the hash positions of e (set during the previous
occurrence of e ) has been reset during the insertion of another stream ele-
ment into the reservoir.
2. When e occurred earlier in the stream it was not inserted due to low inser-
tion probability of the stream then (by reservoir sampling). However,
according to the threshold p * in Algorithm 13.2, every distinct element in
the stream is inserted if the current insertion probability, p i p *. Therefore,
if previous appearances of e had occurred before p i was less than p * and
were not inserted, then it is likely to be detected as an FN when e repeats for
the first time after the insertion probability of the reservoir falls below p *.
Consider the probability of occurrence of an FN for an element e m +1 , at the ( m + 1)t h
iteration. Let the previous occurrence of element e m +1 be at position x , where it was
inserted into the reservoir. Therefore, Pr( e m +1 occurs at x and is inserted) = P x = p x / U.
Now, for all iterations from ( x + 1) to m , either e m +1 has not occurred in the stream or
was not inserted. Thus, Pr( e m +1 has not occurred OR e m +1 has not been inserted after
x ) is given by
mx
m
s
( mmx
Um
)
U
11
p
≤−
s
Um
s
Um
P
=
+
i
1
e
∴=
i psi
/and
is small
x
U
U
ix
=+
1
Now, P r( e m +1 was last inserted at position x ) is given by
−−
sm x
Um
(
)
s
Ux e
PPP
=
(13.4)
l
x
x
Since e m+ 1 was last inserted at position x , the k bits corresponding to e m+ 1 were all
set to 1. Therefore, e m +1 will be a FN if at least one of those k bits is reset to 0. Due
to the deletion operation in case of insertion of an element into the reservoir, some of
those k bits can be reset again. Let y be the last iteration where there is a transition
from 0 to 1 for any of the k n = bits corresponding to e m +1 , after which it is not reset
again until the m th iteration, and hence x y m. Therefore, Pr(a bit is set at y ) = P y  =
p y / s = 1/ y. Also, Pr(that bit is not reset after y ) is given as
m
+−
1
y
m
1
P
== −
p
(
1
p
)
=
(13.5)
y
i
i
s
iy
=+
1
Search WWH ::




Custom Search