Database Reference
In-Depth Information
Clearly, there is some discrimination: the positive class probability of males (0.8)
is much bigger than the positive class probability of females (0.4). Initially, we set
the distribution over the latent labels to be equivalent to the distribution over the
class labels, keeping the discrimination intact:
Latent positive Latent negative
Low income High income Low income High income
Female 0 20 30 0
Male 0 40 10 0
Next, we rectify this situation by subtracting occurrence counts from the males
with positive latent values, and giving these negative latent values. We do the
opposite for females. Since we want the number of rows with actual non-
discriminatory positive labels to be equal to the number of rows with positive la-
bels in the data, the amount of such changes we need to make is unique and easy
to compute. In the example, it is 10, resulting in the following distribution:
Latent positive Latent negative
Low income High income Low income High income
Female 10 20 20 0
Male 0 30 10 10
In this table, both males and females have a probability of 0.6 to obtain a positive
latent value. The latent values are therefore discrimination-free. We use these
counts to determine the probability table P(C | L, S) in the latent variable model.
14.4.4 Comparing the Three Methods
In order to test the three Naive Bayes approaches for discrimination-free classifica-
tion, we performed tests on both artificial and real-world data (Calders & Verwer,
2010). Here we made use of the latent variable model to generate the artificial data-
sets. A big advantage of this artificial data is that we can also generate the actual
class labels that should have been assigned to the rows when there is no discrimina-
tion. These labels are then used to test the accuracy of the classifiers. When using
real-world data, we do not have this luxury of a discrimination-free test-set.
When performing such experiments with discrimination-aware methods, one
should test at least the following quantities: the loss in accuracy and the amount of
remaining discrimination. One always has to make a trade-off between these two
values since discrimination can only be decreased by sacrificing accuracy. The main
conclusions from experiments in (Calders & Verwer, 2010) are that our second
threshold modifying method performs best, achieving zero discrimination with high
 
Search WWH ::




Custom Search