Java Reference
In-Depth Information
of their business strategy need to understand the more general class
of problems they will encounter when trying to mine data in new sit-
uations. In data mining, there are several fundamental functions and
associated algorithms. In this chapter, we characterize mining func-
tions at a high level along several dimensions, and explain the data
mining functions provided in JDM. These mining functions include
classification , regression, attribute importance, clustering, and association .
In the first version of JDM, the expert group felt these five functions
were most commonly used, as well as most mature and amenable for
standardization. Moreover, these functions form a core supporting
many common data mining solutions. We also note algorithms that
are typically associated with these mining functions, especially those
specified in the standard.
4.1
Data Mining Functions
As introduced in Chapter 1, data mining functions can be classified
along several dimensions. For example, supervised and unsupervised ,
descriptive and predictive , transparent and opaque . At some level, all
mining functions are implemented using one or more algorithms,
whether the algorithm details are exposed to the user or not. The
choice of algorithm can impact some of these dimensions, as we
discuss below.
Supervised functions are typically used to predict a value and
require the user to specify a known outcome or target for each case to
be used for model building. Examples of targets include binary
attributes with categories indicating buy/no-buy, churn/no-churn,
and success/failure. A target may also be a multiclass attribute, con-
taining multiple values, for example, indicating likely salary ranges
in $30,000 increments; expected reaction to a drug such as highly
favorable , favorable, no response, unfavorable, or highly unfavorable; or
favorite color. A target may also be a continuous numerical value, for
example, house price, temperature, or the number of copies of a topic
to print. The target allows a supervised data mining algorithm to
learn from “correct” or “actual” examples, those with known out-
comes. Some algorithms assess how well they predict the provided
target values as they build models and adjust the resulting model
accordingly. Others simply count co-occurrences between values in
each non-target attribute, called a predictor attribute , and target
attribute values, ignoring how well it is able to predict.
Supervised functions are of two types: classification and regression .
Classification predicts categorical values; regression predicts continuous
Search WWH ::




Custom Search