Introduction to Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

focus on understanding the way the underlying data operates while

prediction-oriented methods aim to build a behavioral model for obtaining

new and unseen samples and for predicting values of one or more variables

related to the sample. Some prediction-oriented methods, however, can also

contribute to the understanding of the data.

While most of the discovery-oriented techniques use inductive learning

as discussed above, verification methods evaluate a hypothesis proposed

by an external source, such as expert. These techniques include the most

common methods of traditional statistics, like the goodness-of-fit test, the

t

-test of means and analysis of variance. These methods are not as much

related to data mining as are their discovery-oriented counterparts because

most data mining problems are concerned with selecting a hypothesis (out

of a set of hypotheses) rather than testing a known one. While one of the

main objectives of data mining is model identification, statistical methods

usually focus on model estimation [ Elder and Pregibon (1996) ] .

1.6 Supervised Methods

1.6.1

Overview

In the machine learning community, prediction methods are commonly

referred to as supervised learning. Supervised learning stands in opposition

to unsupervised learning which refers to modeling the distribution of

instances in a typical, high-dimensional input space.

According to Kohavi and Provost (1998), the term “unsupervised

learning” refers to “learning techniques that group instances without a

prespecified dependent attribute”. Thus, the term “unsupervised learning”

covers only a portion of the description methods presented in Figure 1.3. For

instance, the term covers clustering methods but not visualization methods.

Supervised methods are methods that attempt to discover the rela-

tionship between input attributes (sometimes called independent variables)

and a target attribute (sometimes referred to as a dependent variable).

The relationship that is discovered is represented in a structure referred

to as a Model . Usually, models describe and explain phenomena which are

hidden in the dataset and which can be used for predicting the value of

the target attribute whenever the values of the input attributes are known.

The supervised methods can be implemented in a variety of domains such

as marketing, finance and manufacturing.

It is useful to distinguish between two main supervised models: Clas-

sification Models ( Classifiers )and Regression Models . Regression models

Search WWH ::

Custom Search

Home