Database Reference
In-Depth Information
= |{
X
D test |
X
(
S
)=
m
,
C
(
X
)=+ }|
disc S = f (
C
,
D test )
:
|{
X
D test |
X
(
S
)=
m
}|
|{
X
D test |
X
(
S
)=
f
,
C
(
X
)=+ }|
.
|{
X
D test
|
X
(
S
)=
f
}|
12.3
Techniques for Discrimination-Free Classification
In this section we discuss different techniques for discrimination-aware classifica-
tion. First, we discuss data pre-processing techniques to make the training data un-
biased before learning a classifier. Second, we discuss the adaptation of a classifier
learning procedure itself to make it discrimination-free. Third, we discuss the mod-
ification of the post-processing phase of a learnt classifier to make it unbiased.
12.3.1
Pre-processing Techniques
The first kind of solutions are based on removing the discrimination from the train-
ing dataset. If we can remove discrimination directly from the source data, a clas-
sifier can be learnt on a cleaned, discrimination-free dataset. Our rationale for this
approach is that, since the classifier is trained on discrimination-free data, it is likely
that its predictions will be (more) discrimination-free as well, as the classifier will
no longer generalize the discrimination. The first approach we discuss here is called
massaging the data (Kamiran & Calders, 2009a). It is based on changing the class
labels in order to remove the discrimination from the training data. The second ap-
proach is less intrusive as it does not change the class labels in the training data.
Instead, weights are assigned to the data objects to make the dataset discrimination-
free. This approach is called reweighing (Calders et al., 2009). Since reweighing
requires the learner to be able to work with weighted tuples, we propose another
variant, in which we re-sample the dataset in such a way that the discrimination is
removed. We refer to this approach as Sampling (Kamiran & Calders, 2010).
12.3.1.1
Massaging
In massaging we change the class labels in the training set; some objects of the de-
prived community change from class
to
+
, and the same number of objects of the
favored community change from
. In this way the discrimination decreases,
yet the overall class distribution is maintained; the same number of people has the
positive class as before. This strategy reduces the discrimination to the desirable
level with the least number of changes to the dataset while keeping the overall class
distribution fixed. Notice that we do not randomly pick the objects to relabel. In-
stead, first we learn a regular, possibly discriminative (i.e. not discrimination-free)
classifier. This classifier, although not acceptable as a final result, still provides use-
ful information. Based on this classifier we can see, for the deprived and favored
communities separately, which instances are closest to the decision boundary .Many
classifiers assign a probability of being in the positive class to the instances, and if
+
to
Search WWH ::




Custom Search