Databases Reference
In-Depth Information
of N . This process of dividing the examples and building children can proceed
to any number of levels. We can stop, and create a leaf, if the group of items
for a node is homogeneous; i.e., they are all positive or all negative examples.
However, we may wish to stop and create a leaf with the majority decision
for a group, even if the group contains both positive and negative examples.
The reason is that the statistical significance of a small group may not be high
enough to rely on. For that reason a variant strategy is to create an ensemble of
decision trees, each using different predicates, but allow the trees to be deeper
than what the available data justifies. Such trees are called overfitted. To
classify an item, apply all the trees in the ensemble, and let them vote on the
outcome. We shall not consider this option here, but give a simple hypothetical
example of a decision tree.
Example 9.6 : Suppose our items are news articles, and features are the high-
TF.IDF words (keywords) in those documents. Further suppose there is a user
U who likes articles about baseball, except articles about the New York Yankees.
The row of the utility matrix for U has 1 if U has read the article and is blank if
not. We shall take the 1's as “like” and the blanks as “doesn't like.” Predicates
will be boolean expressions of keywords.
Since U generally likes baseball, we might find that the best predicate for
the root is “homerun” OR (“batter” AND “pitcher”). Items that satisfy the
predicate will tend to be positive examples (articles with 1 in the row for U in
the utility matrix), and items that fail to satisfy the predicate will tend to be
negative examples (blanks in the utility-matrix row for U ). Figure 9.3 shows
the root as well as the rest of the decision tree.
"homerun"
OR
("batter" AND "pitcher")
Doesn't
Like
"Yankees" OR
"Jeter" OR "Teixeira"
Doesn't
Like
Likes
Figure 9.3: A decision tree
Suppose that the group of articles that do not satisfy the predicate includes
Search WWH ::




Custom Search