Database Reference
In-Depth Information
3.2 Definition of the Classification Problem
The machine learning community was among the first to introduce the
problem of concept learning . Concepts are mental categories for objects,
events, or ideas that have a common set of features. According to Mitchell
(1997): “each concept can be viewed as describing some subset of objects or
events defined over a larger set” (e.g. the subset of a vehicle that constitutes
trucks). To learn a concept is to infer its general definition from a set
of examples. This definition may be either explicitly formulated or left
implicit, but either way it assigns each possible example to the concept
or not. Thus, a concept can be regarded as a function from the instance
space to the Boolean set, namely: c : X
→{−
1 , 1
}
. Alternatively, one can
refer a concept c as a subset of X ,namely:
{
x
X : c ( x )=1
}
.A concept
class C is a set of concepts.
To learn a concept is to infer its general definition from a set of
examples. This definition may be either explicitly formulated or left
implicit, but either way it assigns each possible example to the concept
or not. Thus, a concept can be formally regarded as a function from the set
of all possible examples to the Boolean set
.
Other communities, such as the KDD community, prefer to deal with
a straightforward extension of concept learning ,knownasthe classification
task . In this case, we search for a function that maps the set of all possible
examples into a predefined set of class labels which are not limited to the
Boolean set. Most frequently, the goal of the classifiers inducers is formally
defined as:
Given a training set S with input attributes set A =
{
True, False
}
a 1 ,a 2 ,...,a n }
and a nominal target attribute y from an unknown fixed distribution D
over the labeled instance space, the goal is to induce an optimal classifier
with minimum generalization error.
The generalization error is defined as the misclassification rate over the
distribution D . In case of the nominal attributes, it can be expressed as:
ε ( DT ( S ) ,D )=
x,y∈U
{
D ( x, y )
·
L ( y, DT ( S )( x )) ,
(3.1)
where L ( y, DT ( S )( x )) is the zero one loss function defined as:
L ( y, DT ( S )( x )) = 0
if y = DT ( S )( x )
.
(3.2)
1
if y
= DT ( S )( x )
In case of numeric attributes the sum operator is replaced with the
integration operator.
Search WWH ::




Custom Search