Database Reference
In-Depth Information
stopping-rules are not met, then DFID searches for a split, according to
some splitting rule, represented in Figure 14.1 by the general function split.
Splits in DFID are based on the values of a certain candidate attribute.
We assume that there exists at least a single attribute that can create a split
(or otherwise the stopping-rules would have indicated that there should be
no more splits).
The function split receives a training set, a set of candidate attributes
and optionally some additional inputs. It then returns the attribute upon
whose values the split is based and a set of descendents nodes. Recall that
upon its creation, each node is attached with a rule, which defines the
sub-space of X that the node represents. The rules for the descendent
nodes are conjunctions of the root's rule and restrictions on the values
of the selected attribute. The split that was found may be then subjected
to a validation examination, represented, in Figure 14.1 by the general
function validate. If a split is found to be invalid, then DFID will search for
another split (another attribute). If there are no more candidate attributes,
I will be trained using all the training instances and the classifier that
results will be attached to the root node. As soon as a valid split is
found, the descendent nodes that were created by the split are recursively
considered for further splits. Further splits are achieved by the recurrence of
DFID. In the recurrence, only a subset of the training instances is relevant
(the instances that are actually sorted to the certain descendent node).
In addition, the attribute, which defined the current split, is removed
from the list of candidate attributes. The descendents are finally linked
to their parent (the root). Different DFID implementations may differ in
all or some of the procedures that implement the three main framework
components — stopping-rules (the function StoppingCriterion), splitting
rules (the function split) and split validation examinations (the function
validate).
15.2.1
Stopping Rules
Stopping rules are checked by the general function StoppingCriterion
(Figure 14.1). However, it should be noticed that a negative answer by
this function is not the only condition that stops the DFID recurrence;
another, and even more natural, condition, is the lack of any valid split.
According to the simple stopping rule that NBTree uses, no splits
are considered when there are 30 instances or less in the examined
node. Splitting a node with only a few training instances will hardly
Search WWH ::




Custom Search