Database Reference
In-Depth Information
Table 12.1 Sample relation for the income class example
Sex Ethnicity
Highest Degree Job Type
Class
m
native
university
board
+
m
native
high school
board
+
m
native
university
education
+
m
non-native university
healthcare
+
m
non-native none
healthcare
-
f
non-native high school
board
-
f
native
university
education
-
f
native
none
healthcare
+
f
non-native high school
education
-
f
native
university
board
+
The second class of techniques is based upon the modification of the classifier
learning procedure itself. We show how a decision tree learning algorithm can be
adapted for inducing discrimination-free predictive models. Technical details of this
approach can be found in (Kamiran et al., 2010a). Another approach that belongs to
this class, a non-discriminating Bayesian classifier, can be found in Chapter 14 of
this topic.
The third class of techniques is based upon the post-processing of the learnt mod-
els. We explain one decision tree leaves relabeling approach that allows to make an
already induced decision tree, with an off-the-shelf approach like C4.5 on biased
historical data, discrimination-free (Kamiran et al., 2010b). Another technique in
this class, but for Bayesian models is presented in Chapter 14 of this topic.
We illustrate the behavior of these different types of techniques in Section 12.4
using the well-known Adult dataset (Frank & Asuncion, 2010). The goal associated
with this dataset is to predict, for promotional purposes, whether a person falls into
the high or the low income class. The dataset, however, exhibits a significant gender-
gap with respect to income; there are substantially less females with a high income
than males. Nevertheless, as sketched in the example above, we want to learn a clas-
sifier which is gender-neutral. The sensitive attribute is thus gender, and the deprived
community are the females, the favored community - the males. For the discussed
techniques, we show that they clearly outperform the traditional classification ap-
proaches for this task; without trading in too much accuracy, the discrimination in
the learnt classifier's predictions is reduced to an acceptable level.
12.2
Problem Statement: Discrimination-Aware Classification
The input to our problem consists of a dataset in tabular format, such as the one
in Table 12.1. Every row in the table represents one instance, and there is a special
column Class , indicating the class label that we need to learn to predict for new in-
stances. Based upon the dataset it is expected that a model is learnt that can predict
the class based upon the other attributes of a previously unseen instance. Further-
Search WWH ::




Custom Search