Database Reference
In-Depth Information
part of the figure shows the partitioning induced by the decision tree. For example,
the third leaf in the tree corresponds to all non-native people without a university
diploma. The leaves can hence be seen as non-overlapping “profiles” dividing up the
space of all instances. Every example fits exactly one profile, and with every profile
exactly one class is associated. When a new example needs to be classified by a
decision tree, it is given the majority class label of the region/profile it falls into. If
some of the profiles are very homogeneous with respect to the sensitive attribute; for
instance, containing only members of the deprived community, then this may lead to
discriminative predictions. In l 3 , for instance, two thirds of the instances are from the
deprived community. The relabeling technique now consists of changing the labels
for those regions where this results in the highest reduction in discrimination while
trading in as little accuracy as possible. Conceptually this method corresponds to
merging neighboring regions to form larger, less discriminative profiles. The process
of relabeling continues until the discrimination is removed.
Example 3. Consider the example decision tree given in Figure 12.2. The discrimi-
nation of the decision tree is 20% . Suppose we want to reduce the discrimination to
5% . For each of the leaves it is given how much the discrimination changes (
Δ
disc)
when relabeling the node, and how much the accuracy decreases (
acc). The node
for which the tradeoff between discrimination reduction versus lowered accuracy is
most beneficial, is selected first for relabeling.
Δ
Δ
disc
Node
Δ
acc
Δ
disc
Δ
acc
l 1
40%
0%
0
l 2
10% 10%
1
l 3
30% 10%
1
/
3
In this particular case, the reduction algorithm hence pick l 2 to relabel; that is, the
split on degree is removed and leaves l 2 and l 3 are merged.
12.3.3.2
Related Approaches
The idea of model correction has been explored in different settings, particularly
in cost-sensitive learning, learning from imbalanced data, and context sensitive
or context-aware learning. Concrete examples of model correction include Naive
Bayes prior correction (also in Chapter 14 of this topic) and posterior probabili-
ties correction based on a confusion matrix (Morris & Misra, 2002); nearest neigh-
bor based classification or identification correction based on current context, e.g. in
driver-route identification (Mazhelis, Zliobaite, & Pechenizkiy, 2011) or in context-
sensitive correction of phone recognition output (Levit, Alshawi, Gorin, & Noth,
2003). The tree node relabeling ideas have been used in recognizing textual en-
tailments (Heilman & Smith, 2010) and probabilistic context-free grammar pars-
ing (Johnson, 1998). But these are not related to the idea of decision tree learning.
However, we are not aware of other approaches directly related to the discussed idea
of leaf relabeling in decision trees applicable to our settings.
Search WWH ::




Custom Search