Databases Reference
In-Depth Information
criteria for judging the quality of partitions. Traditional partitioning methods can
be extended for subspace clustering, rather than searching the full data space. This is
useful when there are many attributes and the data are sparse.
Achieving global optimality in partitioning-based clustering is often computation-
ally prohibitive, potentially requiring an exhaustive enumeration of all the possible
partitions. Instead, most applications adopt popular heuristic methods, such as
greedy approaches like the k-means and the k-medoids algorithms, which progres-
sively improve the clustering quality and approach a local optimum. These heuristic
clustering methods work well for finding spherical-shaped clusters in small- to
medium-size databases. To find clusters with complex shapes and for very large data
sets, partitioning-based methods need to be extended. Partitioning-based clustering
methods are studied in depth in Section 10.2.
Hierarchical methods: A hierarchical method creates a hierarchical decomposition of
the given set of data objects. A hierarchical method can be classified as being either
agglomerative or divisive , based on how the hierarchical decomposition is formed.
The agglomerative approach , also called the bottom-up approach, starts with each
object forming a separate group. It successively merges the objects or groups close
to one another, until all the groups are merged into one (the topmost level of the
hierarchy), or a termination condition holds. The divisive approach , also called the
top-down approach, starts with all the objects in the same cluster. In each successive
iteration, a cluster is split into smaller clusters, until eventually each object is in one
cluster, or a termination condition holds.
Hierarchical clustering methods can be distance-based or density- and continuity-
based. Various extensions of hierarchical methods consider clustering in subspaces
as well.
Hierarchical methods suffer from the fact that once a step (merge or split) is done,
it can never be undone. This rigidity is useful in that it leads to smaller computa-
tion costs by not having to worry about a combinatorial number of different choices.
Such techniques cannot correct erroneous decisions; however, methods for improv-
ing the quality of hierarchical clustering have been proposed. Hierarchical clustering
methods are studied in Section 10.3.
Density-based methods: Most partitioning methods cluster objects based on the dis-
tance between objects. Such methods can find only spherical-shaped clusters and
encounter difficulty in discovering clusters of arbitrary shapes. Other clustering
methods have been developed based on the notion of density . Their general idea
is to continue growing a given cluster as long as the density (number of objects or
data points) in the “neighborhood” exceeds some threshold. For example, for each
data point within a given cluster, the neighborhood of a given radius has to contain
at least a minimum number of points. Such a method can be used to filter out noise
or outliers and discover clusters of arbitrary shape.
Density-based methods can divide a set of objects into multiple exclusive clus-
ters, or a hierarchy of clusters. Typically, density-based methods consider exclusive
clusters only, and do not consider fuzzy clusters. Moreover, density-based methods
can be extended from full space to subspace clustering.
Density-based clustering
methods are studied in Section 10.4.
 
Search WWH ::




Custom Search