Introducing Positive Discrimination in Predictive Models - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

accuracy. In addition, the expectation maximization algorithm has problems con-

verging to a good quality solution with zero discrimination. In fact, during later

iterations, it often finds solutions that are worse both in terms of discrimination and

accuracy than solutions found earlier. This strange behavior of the EM algorithm

still has to be further investigated. For a more detailed overview and discussion of

these results, the reader is referred to (Calders & Verwer, 2010).

14.5 A Note on Positive Discrimination

Although discrimination-aware data-mining is necessary in our opinion, one

should be aware that it not only decreases the accuracy of data-mining, it also

has a high probability to introduce positive discrimination. For instance, if we

repeat the final analysis from Section 3 to results obtained using our first threshold

modifying method (until zero discrimination) on the census income data-set, we

obtain the following counts on people that should get a high income according to

the test-set:

Table 14.9 The gender-predicted income contingency table for high income test cases,

assigned by a Naive Bayes classifier with modified decision thresholds

Low income

High income

Female

101

489

Male

1763

1493

Suddenly, females have a much smaller probability of being falsely denied a

high income. This is an example of positive discrimination, and in some countries

this type of discrimination is also considered illegal. These numbers, however, are

determined using the discriminatory labels in the test-set. The actual difference in

false negatives will be smaller using the true non-discriminatory class values. Un-

fortunately, since we do not know who is being discriminated, we cannot know

exactly how to correct these numbers for this discrimination. We can, however,

make an estimated guess based on the assumption that discrimination occurs at

random, and that the number of positives should remain intact.

Under these assumptions, 690 females with a negative class label in the test-set

should actually have a positive label, and 690 males with a positive label should

actually have a negative label. The probability that a female is already assigned a

positive label is equal to the false positive probability, which is 0.1683 (813 out of

4018). Thus, 690•0.1683=116 discriminated females get a positive label, and 574

discriminated females remain. Since these should get a positive label, these counts

are added to the true and false negatives. For the male counts, some positives

should actually be negatives. The false positive probability for males is 0.5415

(1763 out of 3256). Thus, 690•0.5415=374 favored males get a negative label, and

316 favored males remain. Since these counts should actually be negative, we sub-

tract them from the counts in the table. This results in the following table:

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home