Introducing Positive Discrimination in Predictive Models - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

5. While the discrimination is greater than 0

6. Calculate the number of positive class labels P' assigned to the data-set.

7. If P' is greater than P, raise T + by 0.01.

8. If P' is less than or equal to P, lower T - by 0.01.

9. Iterate

10.Use the resulting decision thresholds to classify the test-set.

The idea of this algorithm is to lower the threshold for females if the classifier as-

signs less positive class labels than the number of positive class labels in the data-

set. Otherwise, we raise the decision threshold for males. In this way, we try to keep

the number of positive class labels intact. One may note that since we want to keep

this number intact, it is possible to pre-compute the number of males and females

that should get a different class label in order to obtain a discrimination score of 0:

m change = m assigned - P(positive class) • m total

f change = f assigned - P(positive class) • f total

where m change , m assigned and m total (and f) denote the change in the number of males

(females) that receive a positive class label, the number of males (females) initial-

ly assigned a positive class, and the total number of males (females), respectively.

It is straightforward to set the decision thresholds to values that result in these

changes. Although this calculation is more efficient, we prefer using our algorithm

since it provides an overview of the different threshold settings possible between

the original and discrimination-free models. In addition to changing the decision

thresholds, we remove the sensitive attribute from the Naive Bayes model.

14.4.2 Two Naive Bayes Models

Using the above method, discrimination can be removed completely from a Naive

Bayes classifier. However, it does not actively try to avoid the red-lining effect.

Although the resulting classification is discrimination-free, this classification can

still depend on the sensitive attribute in an indirect way. In our second approach,

we try to avoid this dependence by removing all correlation with the sensitive

attribute from the data-set used to train the Naive Bayes classifier.

Removing all correlation with the sensitive attribute from the data set seems

difficult, but the solution actually is very simple. We divide the data-set into two

sets, each containing people with only one of the sensitive values. Subsequently,

we learn two Naive Bayes models from these two data sets. In the banking exam-

ple, we thus get one model for the male and one for the female population. The

model for males still uses attributes correlated to gender for making its decisions,

but since it has not been trained using data from females; these decisions are not

based on the fact that females are less likely to get positive labels. The predictions

made using these models are therefore independent of the sensitive attribute.

When classifying new people, we first select the appropriate model, and then use

that model to decide on the class label. 5

5 It has been suggested to swap these models, i.e., use the model learned using males to

classify females and vice versa. In our opinion, this makes less sense since this approach

Search WWH ::

Custom Search

Home