Database Reference
In-Depth Information
5. While the discrimination is greater than 0
6. Calculate the number of positive class labels P' assigned to the data-set.
7. If P' is greater than P, raise T + by 0.01.
8. If P' is less than or equal to P, lower T - by 0.01.
9. Iterate
10.Use the resulting decision thresholds to classify the test-set.
The idea of this algorithm is to lower the threshold for females if the classifier as-
signs less positive class labels than the number of positive class labels in the data-
set. Otherwise, we raise the decision threshold for males. In this way, we try to keep
the number of positive class labels intact. One may note that since we want to keep
this number intact, it is possible to pre-compute the number of males and females
that should get a different class label in order to obtain a discrimination score of 0:
m change = m assigned - P(positive class) • m total
f change = f assigned - P(positive class) • f total
where m change , m assigned and m total (and f) denote the change in the number of males
(females) that receive a positive class label, the number of males (females) initial-
ly assigned a positive class, and the total number of males (females), respectively.
It is straightforward to set the decision thresholds to values that result in these
changes. Although this calculation is more efficient, we prefer using our algorithm
since it provides an overview of the different threshold settings possible between
the original and discrimination-free models. In addition to changing the decision
thresholds, we remove the sensitive attribute from the Naive Bayes model.
14.4.2 Two Naive Bayes Models
Using the above method, discrimination can be removed completely from a Naive
Bayes classifier. However, it does not actively try to avoid the red-lining effect.
Although the resulting classification is discrimination-free, this classification can
still depend on the sensitive attribute in an indirect way. In our second approach,
we try to avoid this dependence by removing all correlation with the sensitive
attribute from the data-set used to train the Naive Bayes classifier.
Removing all correlation with the sensitive attribute from the data set seems
difficult, but the solution actually is very simple. We divide the data-set into two
sets, each containing people with only one of the sensitive values. Subsequently,
we learn two Naive Bayes models from these two data sets. In the banking exam-
ple, we thus get one model for the male and one for the female population. The
model for males still uses attributes correlated to gender for making its decisions,
but since it has not been trained using data from females; these decisions are not
based on the fact that females are less likely to get positive labels. The predictions
made using these models are therefore independent of the sensitive attribute.
When classifying new people, we first select the appropriate model, and then use
that model to decide on the class label. 5
5 It has been suggested to swap these models, i.e., use the model learned using males to
classify females and vice versa. In our opinion, this makes less sense since this approach
Search WWH ::




Custom Search