Database Reference
In-Depth Information
Chapter 2
Training Decision Trees
2.1 What is Learning?
The aim of this chapter is to provide an intuitive description of training in
decision trees. The main goal of learning is to improve at some task with
experience. This goal requires the definition of three components:
(1) Task
that we would like to improve with learning.
(2) Experience
T
to be used for learning.
(3) Performance measure
E
P
that is used to measure the improvement.
In order to better understand the above components, consider the
problem of email spam. We all suffer from email spam in which spammers
exploit the electronic mail systems to send unsolicited bulk messages.
A spam message is any message that the user does not want to receive
and did not ask to receive. Machine learning techniques can be used to
automatically filter such spam messages. Applying machine learning in this
case requires the definition of the above-mentioned components, as follows:
(1) The task
is to identify spam emails.
(2) The experience
T
is a set of emails that were labeled by users as spams
and non-spam (ham).
(3) The performance measure
E
is the percentage of spam emails that
were correctly filtered and the percentage of ham (non-spam) emails
that were incorrectly filtered-out.
P
2.2 Preparing the Training Set
In order to automatically filter spam messages, we need to train a
classification model. Obviously, data is very crucial for training the classifier
17
Search WWH ::




Custom Search