Databases Reference
In-Depth Information
Figure 7.1
Connectedness
Understanding. Evaluating differences between clusters may provide
insights into the entire population and can guide future data analysis
directions. Biologists have learned, for example, that it is not advantageous
to study all forms of life as a whole, rather it is better to create taxonomies,
then study individual groupings within the taxonomy.
Data aggregation. When mining very large datasets, observations within
clusters may be aggregated, thus reducing datasets with thousands or even
millions of observations down to one observation per cluster.
The notion of what constitutes a cluster is vaguely defined. For example,
consider the points in Figure 7.1. Howmany clusters do you see? Your answer is
probably three based on the three sequences of points. In this case, your brain
has clustered the points based on a perceived visual connectedness.
Now consider the points in Figure 7.2. How many clusters do you see? The
answer here may be more debatable. Are there two clusters, one on the left and
the other on the right, or are there four clusters, two on the left and two on the
right? Either way, the clusters identified are based on proximity.
Consider the points in Figure 7.3. You will probably agree that there are two
clusters here, but what are the criteria used? Identification of the inner cluster is
based on proximity. Yet if proximity is applied to points A and B, they would not
be in the same cluster. The outside cluster is based on connectedness.
Figure 7.2
Proximity
Search WWH ::




Custom Search