Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

model trained only on male data will learn that every male that scores over 70 in

the test should be accepted. We see that, for instance, applicants #3 and #4 will

have identical characteristics except the gender, yet they will be offered different

decisions. This situation is generally considered to be discriminatory as well.

3.4.2 Computational Modeling for Discrimination Free Decision

Making

Two main principles can be employed for making computational models discrimi-

nation free when historical data is biased. A data miner can either correct the train-

ing data or impose constraints on the model during training.

Correcting the Training Data

The goal of correcting the training data is to make the dataset discrimination free

and/or unbiased. If the training data is discrimination free and unbiased, then we

expect a learned computational model to be discrimination free.

Different techniques or combinations of those techniques can be employed for

modifying data that include, but are not limited to:

1.

modifying labels of the training data,

2.

duplicating or deleting individual samples,

3.

adding synthetic samples,

4.

transforming data into new representation space.

Several existing approaches for discrimination free computational modeling use

data correction techniques (Kamiran & Calders, 2010) (Kamiran & Calders,

2009). For more information see Chapter 12, where selected data correcting tech-

niques are discussed in more detail.

Imposing constraints on the model training

Alternatively to correcting the training data, a model training process can be di-

rected in such a way that anti-discrimination constraints are enforced. The tech-

niques how to do that will depend on specific computational models employed.

Several approaches for imposing such constraints while training exist (Calders &

Verwer, 2010) (Kamiran, Calders, & Pechenizkiy, 2010). For more information

see Chapter 14, where selected techniques for model training with constraints are

discussed in more detail.

3.5 Conclusion and Open Problems

We discussed the mechanisms may produce computational models that may pro-

duce discriminatory decisions. A purely statistics-based, unbiased learning algo-

rithm may produce biased computational models if our training data is biased,

incomplete or incorrect due to discriminatory decisions in the past or due to prop-

erties of the data collection. We have outlined how different implicit assumptions

in the computational techniques for inducing classifiers are often violated, and

Search WWH ::

Custom Search

Home