Database Reference
In-Depth Information
Our visualization technique involves a smart reordering of the similarity
matrix. Ordering of data points for visualization has previously been used
in conjunction with clustering in different contexts. In cluster analysis of
genome data [3.21] reordering the primary data matrix and representing it
graphically have been explored. This visualization takes place in the primary
data space rather than in the relationship space. Sparse primary data-matrix
reorderings have also been considered for browsing hypertext [3.46].
A useful survey of visualization methods for data mining in general can
be found in [3.47]. The popular topics by E. Tufte on visualizing information
are also recommended.
3.8 Concluding Remarks
A poll in June 2001 by KDNuggets ( http://www.kdnuggets.com/ ) indicated
that clustering was by far the most popular type of analysis in the last 12
months at 22% (followed by direct marketing at 14% and cross-sell mod-
els at 12%). The clustering process is characterized by extensive explorative
periods where better domain understanding is gained. Often, in this itera-
tive process the crucially important definitions of features or similarity are
refined. The visualization toolkit
allows even nonspecialists to get
an intuitive visual impression of the grouping nature of objects that may
be originally defined in a high-dimensional space. Taking
Clusion
from a
postprocessing step into the loop can significantly accelerate the process of
discovering domain knowledge, as it provides a powerful visual aid for as-
sessing and improving clustering. For example, actionable recommendations
for splitting or merging point-and-click user interface, and different similarity
metrics can be compared visually. It also guides the user toward the “right
number” of clusters. A demo and selected code of this tool can be found at
http://www.strehl.com/ .
The clustering algorithm presented is largely geared toward the needs of
segmenting transactional data, with provision of getting balanced clusters
and for selecting the quantity (revenue, margins) of interest to influence the
grouping. Thus, rather than evaluating business objectives (such as revenue
contribution) after clustering is done, they are directly integrated into the
clustering algorithm. Moreover, it is a natural fit with the visualization algo-
rithm. We also examined several ways of scaling the clustering routine to a
large number of data points and elaborated on one approach that is able to
use sampling effectively because of the balanced nature of the desired clusters.
Clusion
Acknowledgments
We want to express our gratitude to Mark Davis of Knowledge Discovery
1 (since then acquired by Net Perceptions) for providing the drugstore re-
Search WWH ::




Custom Search