Techniques for Discrimination-Free Predictive Models - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

“male”). This setting represents the simplest possible of all situations and marks the

starting point of the recent discrimination-aware research. For a discussion on more

elaborated settings which builds upon this base case, but involves a more complex

ecology of attributes, see Chapter 8 of this topic.

First we motivate the problem of discrimination-free classification by relating it

to existing anti-discrimination laws that prohibit discrimination in housing, employ-

ment, financing, insurance, and wages on the basis of race, color, national origin,

religion, sex, familial status, and disability (Section 12.2.1). For a more in-depth

discussion on anti-discrimination and privacy legislation, we refer the interested

reader to Chapter 4 of this topic. we give a measure for discrimination on which

the problem of classification without discrimination will be based (Section 12.2.2).

Then, we show how to learn accurate classifiers on discriminatory training data that

do not discriminate in their future predictions (Section 12.3). Particularly, we dis-

cuss three types of techniques that lead to discrimination-free classifiers. The three

classes of techniques and where in the classifier learning process they take place is

illustrated in Figure 12.1.

Input

Training data

Learning

Induce classifier

Output

Predictive Model

−→

(Section 3.1)

- Instance relabeling

(Massaging)

- Reweighing

& Resampling

( Chapter 13 )

- Rule hiding

(Section 3.2)

- DA-Decision trees

( Chapter 14 )

- EM for Bayesian nets

(Section 3.3)

- Leaf Relabeling

in decision trees

( Chapter 14 )

- Adjusting thresholds

in Naıve Bayes

Fig. 12.1 Graphical illustration of the three classes of discrimination-free techniques for

classification

The first class of techniques removes the discrimination from the input data, ei-

ther by selectively relabeling some of the instances (we call this massaging ); for

instance, in the example above, some of the unsuccessful females could be labeled

as successful and some of the successful males as unsuccessful, or by resampling

the input data; that is, some of the successful males are removed from the input data,

and some of the successful females' records get duplicated, or by reweighing, that is

assigning higher weights for unsuccessful females and lower weight for successful

males(Calders, Kamiran, & Pechenizkiy,2009; Kamiran & Calders, 2009a). Another

approach that belongs to this class is described in Chapter 13 of this topic; based on

a collection of discriminative rules detected by discrimination discovery techniques

as described in Chapter 5 of this topic, rule hiding techniques from privacy preserv-

ing data mining (Chapter 11 of this topic) are used to suppress the discriminative

rules in the input data.

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home