Databases Reference
In-Depth Information
7
Cluster Analysis
Introduction
Cluster analysis is the process of grouping observations based on similarity
(visually observed as proximity), connectedness, or density. The results of a
cluster analysis are called a clustering.
Cluster analysis is similar in concept to the previously discussed process of
classification. In classification, the observation groupings (classifications) are
known a priori. The objective of classification analysis is to discover relation-
ships between other dataset attributes and the previously known class attribute
that could be used to predict class membership. However, in cluster analysis the
groupings are not previously known. The objective is the discovery of clusters
of observations grouped according to dataset attribute values.
In data mining, there are a number of potential objectives in conducting a
cluster analysis.
Sub-population identification and isolation. As has been discussed in previous
chapters, datasets may be composed of observations drawn from populations
with different characteristics. Relationships found only in a single subset may
not be as readily identified when exploring the full dataset versus just the
subset. Hence, a good rule of thumb is to isolate the subsets and then analyze
individually. A strategy in product marketing is to first segment the market,
then develop specific promotions for selected market segments. The same
principlemay be applied to datamining - isolate subsets, then develop custom
analysis plans for each.
 
Search WWH ::




Custom Search