Database Reference
In-Depth Information
probability that a bit remains 0 is e 1/4 . In order to be a false positive, a nonmember of
S must hash twice to bits that are 1, and this probability is (1 − e 1/4 ) 2 , or approximately
0.0493. Thus, adding a second hash function for our running example is an improvement,
reducing the false-positive rate from 0.1175 to 0.0493.
4.3.4
Exercises for Section 4.3
EXERCISE 4.3.1 For the situation of our running example (8 billion bits, 1 billion members
of the set S ), calculate the false-positive rate if we use three hash functions? What if we use
four hash functions?
! EXERCISE 4.3.2 Suppose we have n bits of memory available, and our set S has m mem-
bers. Instead of using k hash functions, we could divide the n bits into k arrays, and hash
once to each array. As a function of n , m , and k , what is the probability of a false positive?
How does it compare with using k hash functions into a single array?
!! EXERCISE 4.3.3 As a function of n , the number of bits and m the number of members in
the set S , what number of hash functions minimizes the false-positive rate?
4.4 Counting Distinct Elements in a Stream
In this section we look at a third simple kind of processing we might want to do on a stream.
As with the previous examples - sampling and filtering - it is somewhat tricky to do what
we want in a reasonable amount of main memory, so we use a variety of hashing and a ran-
domized algorithm to get approximately what we want with little space needed per stream.
4.4.1
The Count-Distinct Problem
Suppose stream elements are chosen from some universal set. We would like to know how
many different elements have appeared in the stream, counting either from the beginning
of the stream or from some known time in the past.
EXAMPLE 4.5 As a useful example of this problem, consider a Web site gathering statistics
on how many unique users it has seen in each given month. The universal set is the set
of logins for that site, and a stream element is generated each time someone logs in. This
Search WWH ::




Custom Search