A Generic Algorithm for Top-Down Induction of Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

3.2 Definition of the Classification Problem

The machine learning community was among the first to introduce the

problem of concept learning . Concepts are mental categories for objects,

events, or ideas that have a common set of features. According to Mitchell

(1997): “each concept can be viewed as describing some subset of objects or

events defined over a larger set” (e.g. the subset of a vehicle that constitutes

trucks). To learn a concept is to infer its general definition from a set

of examples. This definition may be either explicitly formulated or left

implicit, but either way it assigns each possible example to the concept

or not. Thus, a concept can be regarded as a function from the instance

space to the Boolean set, namely: c : X

→{−

1 , 1

}

. Alternatively, one can

refer a concept c as a subset of X ,namely:

{

x

∈

X : c ( x )=1

}

.A concept

class C is a set of concepts.

To learn a concept is to infer its general definition from a set of

examples. This definition may be either explicitly formulated or left

implicit, but either way it assigns each possible example to the concept

or not. Thus, a concept can be formally regarded as a function from the set

of all possible examples to the Boolean set

.

Other communities, such as the KDD community, prefer to deal with

a straightforward extension of concept learning ,knownasthe classification

task . In this case, we search for a function that maps the set of all possible

examples into a predefined set of class labels which are not limited to the

Boolean set. Most frequently, the goal of the classifiers inducers is formally

defined as:

Given a training set S with input attributes set A =

{

True, False

}

a 1 ,a 2 ,...,a n }

and a nominal target attribute y from an unknown fixed distribution D

over the labeled instance space, the goal is to induce an optimal classifier

with minimum generalization error.

The generalization error is defined as the misclassification rate over the

distribution D . In case of the nominal attributes, it can be expressed as:

ε ( DT ( S ) ,D )=

x,y∈U

{

D ( x, y )

·

L ( y, DT ( S )( x )) ,

(3.1)

where L ( y, DT ( S )( x )) is the zero one loss function defined as:

L ( y, DT ( S )( x )) = 0

if y = DT ( S )( x )

.

(3.2)

1

if y

= DT ( S )( x )

In case of numeric attributes the sum operator is replaced with the

integration operator.

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home