Database Reference
In-Depth Information
model trained only on male data will learn that every male that scores over 70 in
the test should be accepted. We see that, for instance, applicants #3 and #4 will
have identical characteristics except the gender, yet they will be offered different
decisions. This situation is generally considered to be discriminatory as well.
3.4.2 Computational Modeling for Discrimination Free Decision
Making
Two main principles can be employed for making computational models discrimi-
nation free when historical data is biased. A data miner can either correct the train-
ing data or impose constraints on the model during training.
Correcting the Training Data
The goal of correcting the training data is to make the dataset discrimination free
and/or unbiased. If the training data is discrimination free and unbiased, then we
expect a learned computational model to be discrimination free.
Different techniques or combinations of those techniques can be employed for
modifying data that include, but are not limited to:
1.
modifying labels of the training data,
2.
duplicating or deleting individual samples,
3.
adding synthetic samples,
4.
transforming data into new representation space.
Several existing approaches for discrimination free computational modeling use
data correction techniques (Kamiran & Calders, 2010) (Kamiran & Calders,
2009). For more information see Chapter 12, where selected data correcting tech-
niques are discussed in more detail.
Imposing constraints on the model training
Alternatively to correcting the training data, a model training process can be di-
rected in such a way that anti-discrimination constraints are enforced. The tech-
niques how to do that will depend on specific computational models employed.
Several approaches for imposing such constraints while training exist (Calders &
Verwer, 2010) (Kamiran, Calders, & Pechenizkiy, 2010). For more information
see Chapter 14, where selected techniques for model training with constraints are
discussed in more detail.
3.5 Conclusion and Open Problems
We discussed the mechanisms may produce computational models that may pro-
duce discriminatory decisions. A purely statistics-based, unbiased learning algo-
rithm may produce biased computational models if our training data is biased,
incomplete or incorrect due to discriminatory decisions in the past or due to prop-
erties of the data collection. We have outlined how different implicit assumptions
in the computational techniques for inducing classifiers are often violated, and
Search WWH ::




Custom Search