Database Reference
In-Depth Information
Chapter 10
A Walk-through-guide for Using
Decision Trees Software
10.1
Introduction
There are several decision trees software packages available over the
internet. In the following section, we will review some of the most popular
softwares. We will focus on open-source solutions that are freely available.
For illustrative purposes we will use the Iris dataset. This is one of the
best known datasets in pattern recognition literature. It was first introduced
by R. A. Fisher (1936). The goal in this case is to classify flowers into the Iris
subgeni (such as Iris Setosa, Iris Versicolour and Iris Virginica) according
to their characteristics.
The dataset consists of 150 instances. Each instance refers to one of
the flowers and obtains the flowers' features, such as the length and the
width of the sepal and petal. The label of every instance will be one of
the strings Iris Setosa , Iris Versicolour and Iris Virginica . The task is to
induce a classifier which will be able to predict the class to which a flower
belongs to using its four attributes: sepal length in cm, sepal width in cm,
petal length in cm and petal width in cm.
Table 10.1 illustrates a segment of the Iris dataset. The dataset
contains three classes that correspond to three types of iris flowers:
dom
. Each instance is
characterized by four numeric features (measured in centimeters):
(
y
)=
{IrisSetosa, IrisV ersicolor, IrisV irginica}
A
=
{sepallength, sepalwidth, petallength, petalwidth}
.
151
Search WWH ::




Custom Search