Classifier - Software Development Case Studies in Java

Java Reference

In-Depth Information

Node n # new Node(splitFeature);

// 3) for each possible value vi of s

FeatureType splitType #

(FeatureType)features.get(splitFeature);

Map partitions # performSplit(items,splitFeature,

splitType.allowedValues());

for (Iterator iter # partitions.keySet().iterator();

iter.hasNext();) {

String value # (String) iter.next();

// a) be ni the result of a recursive execution of this

// algorithm where:

// the fist input is: Ti # { item in T | item.s ## vi }

// the second input is: A - { s }

Map partition # (Map)partitions.get(value);

Map remaining # new HashMap(features);

remaining.remove(splitFeature);

Node child # buildDecisionTree(partition,remaining);

// b) set ni as child node of n and label the

// connecting arc vi

n.addChild(value,child);

}

// 4) n is the resulting root node of the decision tree

return n;

}

In step 0 to determine whether all the items in a set belong to the same

category we check if the information content is null (see Equation 4.3).

In step 3, the loop operates on each of the sets of items resulting from the

split. It is more efficient to generate all these sets in a single step instead of

generating them separately. To keep the link between each value of the split

feature and the relative split-set a Map is used. A map stores associations

between a key (the possible value) and a value (the corresponding set of

items), making it very easy to obtain the set of items given the value.

To keep the method buildDecisionTree as close as possible to the original

algorithm (Algorithm 4.2), several details are missing:

how to determine the category of a set of items,

■

how to find the “best” split feature,

■

how to split a set of items,

■

how to calculate information content (in terms of number of bits) of a set

of items,

■

how to calculate the information gain deriving from a split.

■

The findCategory() method determines which is the category of a set of

items. This is not as easy as it seems, since the set can be empty or it can

contain items from several categories. If the set is empty then the category

is unknown, otherwise the frequency of the categories must be computed.

Software Development Case Studies in Java

Search WWH ::

Custom Search

Home