Databases Reference
In-Depth Information
First layer
( i - 1)st layer
i th layer
Figure10.19 Hierarchical structure for STING clustering.
2 test. The type of distribution
of a higher-level cell can be computed based on the majority of distribution types of its
corresponding lower-level cells in conjunction with a threshold filtering process. If the
distributions of the lower-level cells disagree with each other and fail the threshold test,
the distribution type of the high-level cell is set to none .
“How is this statistical information useful for query answering?” The statistical para-
meters can be used in a top-down, grid-based manner as follows. First, a layer within the
hierarchical structure is determined from which the query-answering process is to start.
This layer typically contains a small number of cells. For each cell in the current layer,
we compute the confidence interval (or estimated probability range) reflecting the cell's
relevancy to the given query. The irrelevant cells are removed from further considera-
tion. Processing of the next lower level examines only the remaining relevant cells. This
process is repeated until the bottom layer is reached. At this time, if the query specifica-
tion is met, the regions of relevant cells that satisfy the query are returned. Otherwise,
the data that fall into the relevant cells are retrieved and further processed until they
meet the query's requirements.
An interesting property of STING is that it approaches the clustering result of
DBSCAN if the granularity approaches 0 (i.e., toward very low-level data). In other
words, using the count and cell size information, dense clusters can be identified
approximately using STING. Therefore, STING can also be regarded as a density-based
clustering method.
“What advantages does STING offer over other clustering methods?” STING offers
several advantages: (1) the grid-based computation is query-independent because the
statistical information stored in each cell represents the summary information of the
data in the grid cell, independent of the query; (2) the grid structure facilitates parallel
processing and incremental updating; and (3) the method's efficiency is a major advan-
tage: STING goes through the database once to compute the statistical parameters of the
cells, and hence the time complexity of generating clusters is O
beforehand or obtained by hypothesis tests such as the
n ), where n is the total
number of objects. After generating the hierarchical structure, the query processing time
.
 
Search WWH ::




Custom Search