Techniques for Discrimination-Free Predictive Models - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

= |{

∈

D test |

(

)=+ }|

disc S = f (

D test )

∈

D test |

(

− |{

∈

D test |

(

)=+ }|

∈

D test

(

12.3

Techniques for Discrimination-Free Classification

In this section we discuss different techniques for discrimination-aware classifica-

tion. First, we discuss data pre-processing techniques to make the training data un-

biased before learning a classifier. Second, we discuss the adaptation of a classifier

learning procedure itself to make it discrimination-free. Third, we discuss the mod-

ification of the post-processing phase of a learnt classifier to make it unbiased.

12.3.1

Pre-processing Techniques

The first kind of solutions are based on removing the discrimination from the train-

ing dataset. If we can remove discrimination directly from the source data, a clas-

sifier can be learnt on a cleaned, discrimination-free dataset. Our rationale for this

approach is that, since the classifier is trained on discrimination-free data, it is likely

that its predictions will be (more) discrimination-free as well, as the classifier will

no longer generalize the discrimination. The first approach we discuss here is called

massaging the data (Kamiran & Calders, 2009a). It is based on changing the class

labels in order to remove the discrimination from the training data. The second ap-

proach is less intrusive as it does not change the class labels in the training data.

Instead, weights are assigned to the data objects to make the dataset discrimination-

free. This approach is called reweighing (Calders et al., 2009). Since reweighing

requires the learner to be able to work with weighted tuples, we propose another

variant, in which we re-sample the dataset in such a way that the discrimination is

removed. We refer to this approach as Sampling (Kamiran & Calders, 2010).

12.3.1.1

Massaging

In massaging we change the class labels in the training set; some objects of the de-

prived community change from class

−

, and the same number of objects of the

favored community change from

. In this way the discrimination decreases,

yet the overall class distribution is maintained; the same number of people has the

positive class as before. This strategy reduces the discrimination to the desirable

level with the least number of changes to the dataset while keeping the overall class

distribution fixed. Notice that we do not randomly pick the objects to relabel. In-

stead, first we learn a regular, possibly discriminative (i.e. not discrimination-free)

classifier. This classifier, although not acceptable as a final result, still provides use-

ful information. Based on this classifier we can see, for the deprived and favored

communities separately, which instances are closest to the decision boundary .Many

classifiers assign a probability of being in the positive class to the instances, and if

−

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home