Database Reference
In-Depth Information
• Bucket sizes are again restrained to be nondecreasing as we go back in time. As in
Section 4.6 , we can conclude that there will be O (log N ) buckets.
• The contents of a bucket consists of:
(1) The size of the bucket.
(2) The timestamp of the bucket, that is, the most recent point that contributes to the
bucket. As in Section 4.6 , timestamps can be recorded modulo N .
(3) A collection of records that represent the clusters into which the points of that
bucket have been partitioned. These records contain:
(a) The number of points in the cluster.
(b) The centroid or clustroid of the cluster.
(c) Any other parameters necessary to enable us to merge clusters and maintain
approximations to the full set of parameters for the merged cluster. We shall
give some examples when we discuss the merger process in Section 7.6.4 .
7.6.3
Initializing Buckets
Our smallest bucket size will be p , a power of 2. Thus, every p stream elements, we create
a new bucket, with the most recent p points. The timestamp for this bucket is the timestamp
of the most recent point in the bucket. We may leave each point in a cluster by itself, or we
may perform a clustering of these points according to whatever clustering strategy we have
chosen. For instance, if we choose a k -means algorithm, then (assuming k < p ) we cluster
the points into k clusters by some algorithm.
Whatever method we use to cluster initially, we assume it is possible to compute the
centroids or clustroids for the clusters and count the points in each cluster. This information
becomes part of the record for each cluster. We also compute whatever other parameters
for the clusters will be needed in the merging process.
7.6.4
Merging Buckets
Following the strategy from Section 4.6 , whenever we create a new bucket, we need to re-
view the sequence of buckets. First, if some bucket has a timestamp that is more than N
time units prior to the current time, then nothing of that bucket is in the window, and we
may drop it from the list. Second, we may have created three buckets of size p , in which
case we must merge the oldest two of the three. The merger may create two buckets of size
Search WWH ::




Custom Search