Database Reference
In-Depth Information
Adult
Dutch census
0.3
0.4
0.2
0.3
0.1
0.2
D all
D bad
D all
D bad
0.1
0
explanatory attribute
explanatory attribute
Fig. 8.8 Discrimination in the datasets. We label the over all discrimination as D all and the
illegal discrimination as D bad .
justify some discrimination. Whether relationship is an acceptable argument to jus-
tify differences in income is to be determined by law.
Another dataset that we use is the Dutch Census of 2001 (Dutch Central Bu-
reau for Statistics, 2001), that represents aggregated groups of inhabitants of the
Netherlands. We formulate a binary classification task to classify the individuals
into 'high income' (prestigious) and 'low income' professions, using occupation as
the class label. Individuals are described by 11 categorical attributes. After remov-
ing the records of under-aged people, several professions in the middle level and
people with unknown professions our dataset consists of 60 420 instances. Gender
is treated as the sensitive attribute.
Figure 8.8 (right) presents the discrimination contained in this data. The differ-
ence between the all and the illegal discrimination is much less than in the Adult
data. Here many attributes are not that strongly correlated with gender. Simply re-
moving the sensitive attribute should therefore perform reasonably well. Neverthe-
less, education level, age and economic activity present cases for conditional non-
discrimination, thus we explore this dataset in our experiments.
Non-discrimination Using Local Techniques
Let us analyze how the local techniques handle discrimination 5 . We expect them to
remove exactly the illegal discrimination and nothing more. For comparison we add
a technique that does not use any discrimination handling strategies (blank) and two
local techniques (that, as we discussed, risk to introduce reverse discrimination).
Figure 8.9 shows the resulting discrimination after applying the local massaging
and the local preferential sampling. Both local techniques perform well on the Adult
data. Illegal discrimination is reduced to nearly zero, except for relationship as ex-
planatory attribute when massaging is applied to the Adult dataset. The techniques
also do not produce the reverse discrimination as, e.g., global massaging does.
5
The performance is tested with decision trees J48 via 10-fold cross validation.
Search WWH ::




Custom Search