Databases Reference
In-Depth Information
•Bucket sizes are again restrained to be nondecreasing as we go back in
time. As in Section 4.6, we can conclude that there will be O(log N )
buckets.
•The contents of a bucket consists of:
1. The size of the bucket.
2. The timestamp of the bucket, that is, the most recent point that
contributes to the bucket.
As in Section 4.6, timestamps can be
recorded modulo N .
3. A collection of records that represent the clusters into which the
points of that bucket have been partitioned. These records contain:
(a) The number of points in the cluster.
(b) The centroid or clustroid of the cluster.
(c) Any other parameters necessary to enable us to merge clusters
and maintain approximations to the full set of parameters for the
merged cluster. We shall give some examples when we discuss
the merger process in Section 7.6.4.
7.6.3
Initializing Buckets
Our smallest bucket size will be p, a power of 2. Thus, every p stream elements,
we create a new bucket, with the most recent p points. The timestamp for this
bucket is the timestamp of the most recent point in the bucket. We may leave
each point in a cluster by itself, or we may perform a clustering of these points
according to whatever clustering strategy we have chosen. For instance, if we
choose a k-means algorithm, then (assuming k < p) we cluster the points into
k clusters by some algorithm.
Whatever method we use to cluster initially, we assume it is possible to
compute the centroids or clustroids for the clusters and count the points in
each cluster. This information becomes part of the record for each cluster. We
also compute whatever other parameters for the clusters will be needed in the
merging process.
7.6.4
Merging Buckets
Following the strategy from Section 4.6, whenever we create a new bucket, we
need to review the sequence of buckets. First, if some bucket has a timestamp
that is more than N time units prior to the current time, then nothing of that
bucket is in the window, and we may drop it from the list. Second, we may have
created three buckets of size p, in which case we must merge the oldest two of
the three. The merger may create two buckets of size 2p, in which case we may
have to merge buckets of increasing sizes, recursively, just as in Section 4.6.
To merge two consecutive buckets, we need to do several things:
Search WWH ::




Custom Search