income values and compute the distance between them. For categorical
values, such as marital status, determining similarity can be difficult.
Whereas we could say married is far from single , it is less clear how
much closer divorced is to married or single . Also consider colors. Is
green closer to red or blue ? If considering the colors scientifically, we
could use their frequency or wavelength. However, in considering
customer color choices, this is likely irrelevant. In this case, we may
conclude that two cases are similar for attribute color only if they
have the same color.
Some clustering algorithms produce hierarchies that characterize
relationships among clusters. This can be useful when creating tax-
onomies for organizing documents or products, or trying to deter-
mine what clusters are most meaningful for a particular business
JDM defines algorithm settings for k-means [MacQueen 1967].
However, there are many other clustering algorithms such as self-
organizing maps [Kohonen 1995], orthogonal partitioning clustering
[Milenova/Campos 2002], and hierarchical clustering.
In this chapter, we introduced the mining functions supported in the
first release of JDM and mentioned some of the algorithms that can
be used to support those functions, both those defined for the stan-
dard as well as some other popular algorithms. We discussed each
mining function's capabilities and typical uses. We looked at the
basic data requirements and formats of each mining function and
how results may be interpreted.
There are other mining functions, not currently defined in JDM,
that are useful in various situations. Examples include time series
analysis [Chatfield 2004], to understand trends and cycles of numeri-
cal sequence-oriented data; anomaly detection , to identify unusual
cases based on patterns identified to be normal; and feature extraction ,
to determine higher level attributes or features as linear combina-
tions of the original attributes. In Chapter 18, we discuss some of the
new features, like these, being considered for JDM 2.0.
For additional information on mining functions, see [Berry/
Linoff 2004] [Witten/Frank 2005]. In the next chapter, we look at the
overall strategy adopted for JDM, of which these mining functions
form a part.