Mining Functions and Algorithms - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

of their business strategy need to understand the more general class

of problems they will encounter when trying to mine data in new sit-

uations. In data mining, there are several fundamental functions and

associated algorithms. In this chapter, we characterize mining func-

tions at a high level along several dimensions, and explain the data

mining functions provided in JDM. These mining functions include

classification , regression, attribute importance, clustering, and association .

In the first version of JDM, the expert group felt these five functions

were most commonly used, as well as most mature and amenable for

standardization. Moreover, these functions form a core supporting

many common data mining solutions. We also note algorithms that

are typically associated with these mining functions, especially those

specified in the standard.

4.1

Data Mining Functions

As introduced in Chapter 1, data mining functions can be classified

along several dimensions. For example, supervised and unsupervised ,

descriptive and predictive , transparent and opaque . At some level, all

mining functions are implemented using one or more algorithms,

whether the algorithm details are exposed to the user or not. The

choice of algorithm can impact some of these dimensions, as we

discuss below.

Supervised functions are typically used to predict a value and

require the user to specify a known outcome or target for each case to

be used for model building. Examples of targets include binary

attributes with categories indicating buy/no-buy, churn/no-churn,

and success/failure. A target may also be a multiclass attribute, con-

taining multiple values, for example, indicating likely salary ranges

in $30,000 increments; expected reaction to a drug such as highly

favorable , favorable, no response, unfavorable, or highly unfavorable; or

favorite color. A target may also be a continuous numerical value, for

example, house price, temperature, or the number of copies of a topic

to print. The target allows a supervised data mining algorithm to

learn from “correct” or “actual” examples, those with known out-

comes. Some algorithms assess how well they predict the provided

target values as they build models and adjust the resulting model

accordingly. Others simply count co-occurrences between values in

each non-target attribute, called a predictor attribute , and target

attribute values, ignoring how well it is able to predict.

Supervised functions are of two types: classification and regression .

Classification predicts categorical values; regression predicts continuous

Search WWH ::

Custom Search

Home