Database Reference
In-Depth Information
The local preferential sampling applies the same principles of preferential sam-
pling (Kamiran and Calders, 2010) but now locally to partitions of the data. It mod-
ifies and controls the number of accepted male and female, to ensure no redlining.
The procedure for local preferential sampling is presented in Figure 8.7.
Fig. 8.7 Local preferential sampling
8.4.2
Computational Experiments
In this section we demonstrate the performance of the local discrimination handling
techniques on real world datasets. The objective is to minimize the absolute value
of the illegal discrimination while keeping the accuracy as high as possible. It is
important not to overshoot and end up with a reverse discrimination.
Data
For our experiments we use two real datasets. The Adult dataset comes from UCI
(Asuncion and Newman, 2007), the task is to classify individuals into high and low
income classes. Our dataset consists of a uniform sample of 15 696 instances, which
are described by 13 attributes and a class label. Originally 6 of the 13 attributes were
numeric attributes, which we discretized. Gender is the sensitive attribute, income is
the label. We repeat our experiments several times, where any of the other attributes
in turn is selected as explanatory. Figure 8.8 (left) shows the discrimination in the
dataset. The horizontal axis denotes the index of the explanatory attribute.
In the Adult dataset a number of attributes are weakly related with gender (such
as workclass, education, occupation, race, capital loss, native country). Therefore,
nominating any of those attributes as explanatory will not explain much of the dis-
crimination. For instance, knowledge of biology suggests that race and gender are
independent. Thus, race cannot explain the discrimination on gender; that discrim-
ination is either illegal or it is due to some other attributes. Indeed, the plot shows
that all the discrimination is illegal, when treating race (attribute #7) as explanatory.
On the other hand, we observe that the relationship (attribute #6) explains a great
deal of D all . Judging subjectively, the values of this attribute 'wife' and 'husband'
clearly capture the gender information and from the data mining perspective, if we
are allowed to treat it as acceptable, a large part of the discrimination is explained.
Age, and working hours per week are other examples of explanatory attributes that
Search WWH ::




Custom Search