Java Reference
In-Depth Information
The performSplit() method executes the split of a set of items according to
a split feature. It returns the resulting sets of items, each of them being
labelled with the relative feature value.
private static Map performSplit(Map items, String split,
Collection possibleValues){
Map partitions # new HashMap();
for (Iterator iter # possibleValues.iterator();
iter.hasNext();) {
String value # (String) iter.next();
partitions.put(value, new HashMap());
}
Iterator it # items.keySet().iterator();
while (it.hasNext()){
Item item # (Item)it.next();
String splitValue # item.value(split);
Map partition # (Map)partitions.get(splitValue);
partition.put(item,items.get(item));
}
return partitions;
}
The evaluateSplitGain() evaluates the information gain that would derive
from splitting a set of items according to a given split feature; it computes
Equation 4.4.
private static double evaluateSplitGain(Map items,
String split, Collection possibleValues){
double origInfo # information(items);
double splitInfo # 0;
Map partitions # performSplit(items,split,possibleValues);
double size # items.size();
for (Iterator iter # possibleValues.iterator();
iter.hasNext();) {
String value # (String) iter.next();
Map partition # (Map)partitions.get(value);
double partitionSize # partition.size();
double partitionInfo # information(partition);
splitInfo !# partitionSize/size*partitionInfo;
}
return origInfo - splitInfo;
}
The information() method computes the information content of a set of
items. The categories represent the symbols of the alphabet. As described in
Equation 4.1, the information can be computed from the frequency of each
symbol. Thus first of all we need to count the number of occurrences of each
symbol, then we can apply the equation to compute the information content.
Search WWH ::




Custom Search