Techniques for Discrimination-Free Predictive Models - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

discrimination and 86.01% accuracy even though the sensitive attribute was not used

at the prediction time. We observe in our experiments that learning a decision tree

with modified splitting criterion, that is, using the second type of discrimination-

aware classification alone does not significantly reduce the discrimination. However,

when the decision trees are learnt on cleaner data obtained with discrimination-

aware pre-processing techniques, the discrimination is reduced to 3.32% while

keeping the accuracy at 84.44%. The decision trees with leaf relabeling were able

in our experiment to reduce the discrimination to 0% while keeping a reasonably

high accuracy. Figure 12.3 also shows that our proposed methods outperform the

discrimination-aware Naıve Bayes model of Chapter 14 of this topic with respect to

the accuracy-discrimination trade-off.

12.5

Discussion and Conclusion

In this chapter we discussed the idea of discrimination-aware classification and in-

troduced a procedural way to calculate the discrimination in a given dataset and in

the predictions of a classifier. We also discussed three types of techniques to learn

the discrimination-free classifiers which include data preprocessing techniques, an

adapted classifier learning procedure and an approach for postprocessing of a learnt

decision tree by changing the labels of some of its leaves to make the final pre-

dictive model discrimination-free. Finally, we presented empirical validation results

showing that the discrimination-aware classification methods predict labels for the

previously unseen data objects with no or significantly lower discrimination and

with the minimal loss of accuracy.

Depending on the situation one of the proposed techniques may be better than an-

other. First of all, if none of the other attributes is correlated to the sensitive attribute,

clearly it suffices to just remove this attribute. Unfortunately this is seldomly the

case, and even if it is the case, no guarantees can be given that no such correlations

exist. The presented preprocessing techniques have the advantage that they make in-

put data discrimination-free which can then be used by any classification algorithm,

yet have the disadvantage of giving no guarantee about the degree of discrimination

in the final classifier. The model post-processing techniques do not have this dis-

advantage; in principle the postprocessing is continued until a discrimination-free

classifier (on a validation set) is obtained. The model post-processing techniques as

well as the learner adaptation techniques on their turn, however, have the disadvan-

tage of being model and even algorithm specific; for every classifier new algorithms

will have to be invented. In the experiments it was further shown that the learner

adaptation approach did not work as expected, unless it was combined with the

post-processing techniques. This surprising failure calls for more research to better

understand the reasons for it.

Despite of showing some promising results on discrimination-free classifier con-

struction, our study is far from complete. For instance, often there is a much more

complex ecology of attributes than what is assumed in the chapter. In the chapter

Search WWH ::

Custom Search

Home