Mining Functions and Algorithms - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

income values and compute the distance between them. For categorical

values, such as marital status, determining similarity can be difficult.

Whereas we could say married is far from single , it is less clear how

much closer divorced is to married or single . Also consider colors. Is

green closer to red or blue ? If considering the colors scientifically, we

could use their frequency or wavelength. However, in considering

customer color choices, this is likely irrelevant. In this case, we may

conclude that two cases are similar for attribute color only if they

have the same color.

Some clustering algorithms produce hierarchies that characterize

relationships among clusters. This can be useful when creating tax-

onomies for organizing documents or products, or trying to deter-

mine what clusters are most meaningful for a particular business

problem.

JDM defines algorithm settings for k-means [MacQueen 1967].

However, there are many other clustering algorithms such as self-

organizing maps [Kohonen 1995], orthogonal partitioning clustering

[Milenova/Campos 2002], and hierarchical clustering.

4.7

Summary

In this chapter, we introduced the mining functions supported in the

first release of JDM and mentioned some of the algorithms that can

be used to support those functions, both those defined for the stan-

dard as well as some other popular algorithms. We discussed each

mining function's capabilities and typical uses. We looked at the

basic data requirements and formats of each mining function and

how results may be interpreted.

There are other mining functions, not currently defined in JDM,

that are useful in various situations. Examples include time series

analysis [Chatfield 2004], to understand trends and cycles of numeri-

cal sequence-oriented data; anomaly detection , to identify unusual

cases based on patterns identified to be normal; and feature extraction ,

to determine higher level attributes or features as linear combina-

tions of the original attributes. In Chapter 18, we discuss some of the

new features, like these, being considered for JDM 2.0.

For additional information on mining functions, see [Berry/

Linoff 2004] [Witten/Frank 2005]. In the next chapter, we look at the

overall strategy adopted for JDM, of which these mining functions

form a part.

Search WWH ::

Custom Search

Home