A Generic Algorithm for Top-Down Induction of Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

Chapter 3

A Generic Algorithm for Top-Down

Induction of Decision Trees

3.1 Training Set

Before presenting the pseudo code for top-down inducing algorithm of a

decision tree, we present several definitions and notations that will be used.

In a typical supervised learning scenario, a training set is given and the

goal is to form a description that can be used to predict previously unseen

examples.

In many machine learning algorithms, the training set size and the

classification accuracy are positively correlated. In particular, the predictive

performance of a decision tree is expected to be poor if the training set

used to train the model is too small. Moreover, the size of a classification

tree is strongly correlated to the training set size. This phenomenon can be

explained by the fact that in most decision trees algorithms (such as CART

and C4.5) the number of leaves is bounded by the number of instances (as

you can't split a node if you don't have enough instances).

The training set can be described in a variety of ways. Most frequently,

it is described as a bag instance of a certain bag schema. A bag instance

is a collection of tuples (also known as records, rows or instances) that

may contain duplicates. Each tuple is described by a vector of attribute

values. The bag schema provides the description of the attributes and their

domains. In this topic, a bag schema is denoted as B ( A

∪

y )where A denotes

the set of input attributes containing n attributes: A =

a 1 ,...,a i ,...,a n }

and y represents the class variable or the target attribute.

Attributes (sometimes called field, variable or feature) are typically

one of two types: nominal (values are members of an unordered set), or

numeric (values are real numbers). In attribute a i ,itisusefulto

denote its domain values by dom ( a i )=

{

v i, 1 ,v i, 2 ,...,v i,|dom ( a i ) | }

,where

23

Search WWH ::

Custom Search

Home