Training Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

Chapter 2

Training Decision Trees

2.1 What is Learning?

The aim of this chapter is to provide an intuitive description of training in

decision trees. The main goal of learning is to improve at some task with

experience. This goal requires the definition of three components:

(1) Task

that we would like to improve with learning.

(2) Experience

T

to be used for learning.

(3) Performance measure

E

P

that is used to measure the improvement.

In order to better understand the above components, consider the

problem of email spam. We all suffer from email spam in which spammers

exploit the electronic mail systems to send unsolicited bulk messages.

A spam message is any message that the user does not want to receive

and did not ask to receive. Machine learning techniques can be used to

automatically filter such spam messages. Applying machine learning in this

case requires the definition of the above-mentioned components, as follows:

(1) The task

is to identify spam emails.

(2) The experience

T

is a set of emails that were labeled by users as spams

and non-spam (ham).

(3) The performance measure

E

is the percentage of spam emails that

were correctly filtered and the percentage of ham (non-spam) emails

that were incorrectly filtered-out.

P

2.2 Preparing the Training Set

In order to automatically filter spam messages, we need to train a

classification model. Obviously, data is very crucial for training the classifier

17

Search WWH ::

Custom Search

Home