Databases Reference
In-Depth Information
Grid-based methods: Grid-based methods quantize the object space into a finite
number of cells that form a grid structure. All the clustering operations are per-
formed on the grid structure (i.e., on the quantized space). The main advantage of
this approach is its fast processing time, which is typically independent of the num-
ber of data objects and dependent only on the number of cells in each dimension in
the quantized space.
Using grids is often an efficient approach to many spatial data mining problems,
including clustering. Therefore, grid-based methods can be integrated with other
clustering methods such as density-based methods and hierarchical methods. Grid-
based clustering is studied in Section 10.5.
These methods are briefly summarized in Figure 10.1. Some clustering algorithms
integrate the ideas of several clustering methods, so that it is sometimes difficult to clas-
sify a given algorithm as uniquely belonging to only one clustering method category.
Furthermore, some applications may have clustering criteria that require the integration
of several clustering techniques.
In the following sections, we examine each clustering method in detail. Advanced
clustering methods and related issues are discussed in Chapter 11. In general, the
notation used is as follows. Let D be a data set of n objects to be clustered. An object is
described by d variables, where each variable is also called an attribute or a dimension,
Method
General Characteristics
Partitioning
methods
- Find mutually exclusive clusters of spherical shape
- Distance-based
- May use mean or medoid (etc.) to represent cluster center
- Effective for small- to medium-size data sets
Hierarchical
methods
- Clustering is a hierarchical decomposition (i.e., multiple levels)
- Cannot correct erroneous merges or splits
- May incorporate other techniques like microclustering or
consider object “linkages”
Density-based
methods
- Can find arbitrarily shaped clusters
- Clusters are dense regions of objects in space that are
separated by low-density regions
- Cluster density: Each point must have a minimum number of
points within its “neighborhood”
- May filter out outliers
Grid-based
methods
- Use a multiresolution grid data structure
- Fast processing time (typically independent of the number of
data objects, yet dependent on grid size)
Figure10.1 Overview of clustering methods discussed in this chapter. Note that some algorithms may
combine various methods.
 
Search WWH ::




Custom Search