Techniques for Discrimination-Free Predictive Models - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

Table 12.1 Sample relation for the income class example

Sex Ethnicity

Highest Degree Job Type

Class

m

native

university

board

+

m

native

high school

board

+

m

native

university

education

+

m

non-native university

healthcare

+

m

non-native none

healthcare

-

f

non-native high school

board

-

f

native

university

education

-

f

native

none

healthcare

+

f

non-native high school

education

-

f

native

university

board

+

The second class of techniques is based upon the modification of the classifier

learning procedure itself. We show how a decision tree learning algorithm can be

adapted for inducing discrimination-free predictive models. Technical details of this

approach can be found in (Kamiran et al., 2010a). Another approach that belongs to

this class, a non-discriminating Bayesian classifier, can be found in Chapter 14 of

this topic.

The third class of techniques is based upon the post-processing of the learnt mod-

els. We explain one decision tree leaves relabeling approach that allows to make an

already induced decision tree, with an off-the-shelf approach like C4.5 on biased

historical data, discrimination-free (Kamiran et al., 2010b). Another technique in

this class, but for Bayesian models is presented in Chapter 14 of this topic.

We illustrate the behavior of these different types of techniques in Section 12.4

using the well-known Adult dataset (Frank & Asuncion, 2010). The goal associated

with this dataset is to predict, for promotional purposes, whether a person falls into

the high or the low income class. The dataset, however, exhibits a significant gender-

gap with respect to income; there are substantially less females with a high income

than males. Nevertheless, as sketched in the example above, we want to learn a clas-

sifier which is gender-neutral. The sensitive attribute is thus gender, and the deprived

community are the females, the favored community - the males. For the discussed

techniques, we show that they clearly outperform the traditional classification ap-

proaches for this task; without trading in too much accuracy, the discrimination in

the learnt classifier's predictions is reduced to an acceptable level.

12.2

Problem Statement: Discrimination-Aware Classification

The input to our problem consists of a dataset in tabular format, such as the one

in Table 12.1. Every row in the table represents one instance, and there is a special

column Class , indicating the class label that we need to learn to predict for new in-

stances. Based upon the dataset it is expected that a model is learnt that can predict

the class based upon the other attributes of a previously unseen instance. Further-

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home