Classifier - Software Development Case Studies in Java

Java Reference

In-Depth Information

The classifier must be as generic as possible in order to be applicable also

to similar problems. It has to take into account extensions such as new

characteristics of the items and different items described by a brand new set

of characteristics. During the development of the classifier the reference will

be the car accident and theft insurances problem but we will keep the

system open to extensions.

4.2.1

Domain models

In artificial intelligence and data-mining literature it is possible to find many

works dealing with classification and the construction of classifiers. The

essential concepts involved in building and using a classifier are:

Item : is the element that is or has to be assigned to a category. It is

described by a set of features.

■

Feature : consists of a name and a value. It is used to describe an item.

■

Category : is a tag that is applied to an item based on its features.

■

Classifier : is the tool that automatically assigns an item to a category.

■

Training set : is a set of items that have already been assigned to a category.

It is used to capture the assignment criteria.

■

We will assume that all features are of nominal type. That is, the features

can assume value only in a finite set whose elements are defined completely

by enumerating them. This assumption makes all the algorithms, both for

identifying the classification rules and for classifying items, simpler.

There are several kinds of classifier models. The decision tree is one of the

simplest (Cherkassky and Mulier 1998). It can easily represent hierarchies

of concepts and has the great advantage of being easy to understand. In a

decision tree both nodes and arcs have a label. Each non-leaf node is associ-

ated with a split feature. The arcs to the children nodes are associated with

all the possible values of the parent feature. Each leaf is labelled with the

name of a category.

The algorithm to assign a category to a new item is described below

(Algorithm 4.1). The idea behind the algorithm is to find a path that

describes the features of the item, starting from the root. At each node the

path corresponding to the value of the feature is followed.

Input: an item, a decision tree

Output: a category for the item

1 start setting the root node as the current

2 repeat while the current node is not a leaf

(a) the node label is the name of the feature to be considered

(b) consider the value of the item's feature

(c) follow the arc corresponding to the value

(d) the node reached becomes the current node

3 the label of the leaf node is the category of the item

Algorithm 4.1 Categorization

Software Development Case Studies in Java

Search WWH ::

Custom Search

Home