Database Reference
In-Depth Information
focus on understanding the way the underlying data operates while
prediction-oriented methods aim to build a behavioral model for obtaining
new and unseen samples and for predicting values of one or more variables
related to the sample. Some prediction-oriented methods, however, can also
contribute to the understanding of the data.
While most of the discovery-oriented techniques use inductive learning
as discussed above, verification methods evaluate a hypothesis proposed
by an external source, such as expert. These techniques include the most
common methods of traditional statistics, like the goodness-of-fit test, the
t
-test of means and analysis of variance. These methods are not as much
related to data mining as are their discovery-oriented counterparts because
most data mining problems are concerned with selecting a hypothesis (out
of a set of hypotheses) rather than testing a known one. While one of the
main objectives of data mining is model identification, statistical methods
usually focus on model estimation [ Elder and Pregibon (1996) ] .
1.6 Supervised Methods
1.6.1
Overview
In the machine learning community, prediction methods are commonly
referred to as supervised learning. Supervised learning stands in opposition
to unsupervised learning which refers to modeling the distribution of
instances in a typical, high-dimensional input space.
According to Kohavi and Provost (1998), the term “unsupervised
learning” refers to “learning techniques that group instances without a
prespecified dependent attribute”. Thus, the term “unsupervised learning”
covers only a portion of the description methods presented in Figure 1.3. For
instance, the term covers clustering methods but not visualization methods.
Supervised methods are methods that attempt to discover the rela-
tionship between input attributes (sometimes called independent variables)
and a target attribute (sometimes referred to as a dependent variable).
The relationship that is discovered is represented in a structure referred
to as a Model . Usually, models describe and explain phenomena which are
hidden in the dataset and which can be used for predicting the value of
the target attribute whenever the values of the input attributes are known.
The supervised methods can be implemented in a variety of domains such
as marketing, finance and manufacturing.
It is useful to distinguish between two main supervised models: Clas-
sification Models ( Classifiers )and Regression Models . Regression models
Search WWH ::




Custom Search