Database Reference
In-Depth Information
seen in the stream (i.e., the most recent timestamp) modulo N , then we can determine from
a timestamp modulo N where in the current window the bit with that timestamp is.
We divide the window into buckets , 5 consisting of:
(1) The timestamp of its right (most recent) end.
(2) The number of 1s in the bucket. This number must be a power of 2, and we refer to the
number of 1s as the size of the bucket.
To represent a bucket, we need log 2 N bits to represent the timestamp (modulo N ) of its
right end. To represent the number of 1s we only need log 2 log 2 N bits. The reason is that
we know this number i is a power of 2, say 2 j , so we can represent i by coding j in binary.
Since j is at most log 2 N , it requires log 2 log 2 N bits. Thus, O (log N ) bits suffice to represent
a bucket.
There are six rules that must be followed when representing a stream by buckets.
• The right end of a bucket is always a position with a 1.
• Every position with a 1 is in some bucket.
• No position is in more than one bucket.
• There are one or two buckets of any given size, up to some maximum size.
• All sizes must be a power of 2.
• Buckets cannot decrease in size as we move to the left (back in time).
EXAMPLE 4.11 Figure 4.2 shows a bit stream divided into buckets in a way that satisfies
the DGIM rules. At the right (most recent) end we see two buckets of size 1. To its left we
see one bucket of size 2. Note that this bucket covers four positions, but only two of them
are 1. Proceeding left, we see two buckets of size 4, and we suggest that a bucket of size 8
exists further left.
Figure 4.2 A bit-stream divided into buckets following the DGIM rules
Notice that it is OK for some 0s to lie between buckets. Also, observe from Fig. 4.2 that
the buckets do not overlap; there are one or two of each size up to the largest size, and sizes
only increase moving left.
In the next sections, we shall explain the following about the DGIM algorithm:
Search WWH ::




Custom Search