Techniques for Discrimination-Free Predictive Models - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

Table 12.2 Sample relation for the income class example with positive class probability

Sex Ethnicity

Highest Degree Job Type

Class Prob

native

university

board

.99

native

high school

board

.90

native

university

education

.92

non-native university

healthcare +

.76

non-native

none

healthcare

.44

non-native

high school

board

.09

native

university

education

.66

native

none

healthcare

.66

non-native

high school

education

.02

native

university

board

.92

this probability exceeds 0.5, the object is assigned to the positive class. The objects

close to the decision boundary are those with a probability close to 0.5. We select

these objects first to relabel.

Example 1. Consider again the dataset D given in Table 12.1. We want to learn

a classifier to predict the class of objects for which the predictions are non-

discriminatory towards Sex

f . In this example we rank the objects by their positive

class probability given by a Naive Bayes classification model. In Table 12.2 the pos-

itive class probabilities as given by this ranker are added to the table for reference

(calculated using the “NBS” classifier of Weka (Hall et al., 2009)).

In the second step, we arrange the data separately for female applicants with

class

in ascending order

with respect to their positive class probability. Relabeling one promotion candidate

and one demotion candidate makes the data discrimination-free. Hence, we relabel

the top promotion candidate; that is, the highest scoring female with a negative

class label, and the top demotion candidate; that is, the lowest scoring male with

a positive class label (the bold examples in Table 12.2). After the labels for these

instances are changed, the discrimination decreases from 40% to 0% . The resulting

dataset is used as a training set for classifier induction.

−

in descending order and for male applicants with class

12.3.1.2

Reweighing and Resampling

The massaging approach is rather intrusive as it changes the class labels of the ob-

jects. Our second approach does not have this disadvantage. Instead of relabeling

the objects, different weights are attached to them. For example, the deprived com-

munity objects with X

(

Class

)=+

get higher weights than the deprived community

objects with X

(

Class

)= −

and the favored community objects with X

(

Class

)=+

(

)= −

get lower weights than the favored community objects with X

. We refer

to this method as massaging . Again we assume that we want to reduce the discrimi-

nation to 0 while maintaining the overall positive class probability. We now discuss

the idea behind the weight calculation.

Class

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home