Database Reference
In-Depth Information
3.2 Definition of the Classification Problem
The machine learning community was among the first to introduce the
problem of
concept learning
. Concepts are mental categories for objects,
events, or ideas that have a common set of features. According to Mitchell
(1997): “each concept can be viewed as describing some subset of objects or
events defined over a larger set” (e.g. the subset of a vehicle that constitutes
trucks). To learn a concept is to infer its general definition from a set
of examples. This definition may be either explicitly formulated or left
implicit, but either way it assigns each possible example to the concept
or not. Thus, a concept can be regarded as a function from the instance
space to the Boolean set, namely:
c
:
X
→{−
1
,
1
}
. Alternatively, one can
refer a concept
c
as a subset of
X
,namely:
{
x
∈
X
:
c
(
x
)=1
}
.A
concept
class
C
is a set of concepts.
To learn a concept is to infer its general definition from a set of
examples. This definition may be either explicitly formulated or left
implicit, but either way it assigns each possible example to the concept
or not. Thus, a concept can be formally regarded as a function from the set
of all possible examples to the Boolean set
.
Other communities, such as the KDD community, prefer to deal with
a straightforward extension of
concept learning
,knownasthe
classification
task
. In this case, we search for a function that maps the set of all possible
examples into a predefined set of class labels which are not limited to the
Boolean set. Most frequently, the goal of the classifiers inducers is formally
defined as:
Given a training set
S
with input attributes set
A
=
{
True, False
}
a
1
,a
2
,...,a
n
}
and a nominal target attribute
y
from an unknown fixed distribution
D
over the labeled instance space, the goal is to induce an optimal classifier
with minimum generalization error.
The generalization error is defined as the misclassification rate over the
distribution
D
. In case of the nominal attributes, it can be expressed as:
ε
(
DT
(
S
)
,D
)=
x,y∈U
{
D
(
x, y
)
·
L
(
y, DT
(
S
)(
x
))
,
(3.1)
where
L
(
y, DT
(
S
)(
x
)) is the zero one loss function defined as:
L
(
y, DT
(
S
)(
x
)) =
0
if y
=
DT
(
S
)(
x
)
.
(3.2)
1
if y
=
DT
(
S
)(
x
)
In case of numeric attributes the sum operator is replaced with the
integration operator.
Search WWH ::
Custom Search