Cluster Analysis: Basic Concepts and Methods - Data Mining: Concepts and Techniques - page 480

Databases Reference

In-Depth Information

First layer

( i - 1)st layer

i th layer

Figure10.19 Hierarchical structure for STING clustering.

2 test. The type of distribution

of a higher-level cell can be computed based on the majority of distribution types of its

corresponding lower-level cells in conjunction with a threshold filtering process. If the

distributions of the lower-level cells disagree with each other and fail the threshold test,

the distribution type of the high-level cell is set to none .

“How is this statistical information useful for query answering?” The statistical para-

meters can be used in a top-down, grid-based manner as follows. First, a layer within the

hierarchical structure is determined from which the query-answering process is to start.

This layer typically contains a small number of cells. For each cell in the current layer,

we compute the confidence interval (or estimated probability range) reflecting the cell's

relevancy to the given query. The irrelevant cells are removed from further considera-

tion. Processing of the next lower level examines only the remaining relevant cells. This

process is repeated until the bottom layer is reached. At this time, if the query specifica-

tion is met, the regions of relevant cells that satisfy the query are returned. Otherwise,

the data that fall into the relevant cells are retrieved and further processed until they

meet the query's requirements.

An interesting property of STING is that it approaches the clustering result of

DBSCAN if the granularity approaches 0 (i.e., toward very low-level data). In other

words, using the count and cell size information, dense clusters can be identified

approximately using STING. Therefore, STING can also be regarded as a density-based

clustering method.

“What advantages does STING offer over other clustering methods?” STING offers

several advantages: (1) the grid-based computation is query-independent because the

statistical information stored in each cell represents the summary information of the

data in the grid cell, independent of the query; (2) the grid structure facilitates parallel

processing and incremental updating; and (3) the method's efficiency is a major advan-

tage: STING goes through the database once to compute the statistical parameters of the

cells, and hence the time complexity of generating clusters is O

beforehand or obtained by hypothesis tests such as the

n ), where n is the total

number of objects. After generating the hierarchical structure, the query processing time

.

Next Page

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home