Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

disability, marital status, genetic features, language and age. It does so in a num-

ber of settings, such as employment and training, access to housing, public servic-

es, education and health care; credit and insurance; and adoption. European efforts

on the non-discrimination front make clear the fundamental importance for

Europe's citizens of the effective implementation and enforcement of non-

discrimination norms. As a recent European Court of Justice case-law on age dis-

crimination suggests, non-discrimination norms constitute fundamental principles

of the European legal order. (See, e.g., Case 144/04 [2005] ECR I-9981 (ECJ),

Judgment of the Court of 22 November 2005, Werner Mangold v Rüdiger Helm;

Case C-555/07 [2010], Judgment of the Court (Grand Chamber) of 19 January

2010, Seda Kücükdeveci v Swedex GmbH & Co. KG.) Therefore it is in the in-

terest of banks, insurance companies, employment agencies, the police and other

institutions that employ computational models for decision making upon individu-

als to ensure that these computational models are free from discrimination. In this

chapter, discrimination is considered to be present if for two individuals that have

the same characteristic relevant to the decision making and differ only in the sen-

sitive attribute (e.g., gender or race) a model results in different decisions.

The main reason that data mining can lead to discrimination is that the compu-

tational model construction methods are often based upon assumptions that turn

out not to be true in practice. For example, in general it is assumed that the data on

which the model is learned follows the same distribution as the data on which the

classifier will have to work; i.e., the situation will not change. In section 4.2 we

elaborate on the implicit assumptions made during classifier construction and

illustrate with fictitious examples how they may be violated in real situations. In

Section 4.3 we move on to show how this mismatch between reality and the as-

sumptions could lead to discriminatory decision processes. We show three types

of problems that may occur: sampling bias, incomplete data, or incorrect labeling.

We show detailed scenarios in which the problems are illustrated. In Section 4.4

we discuss some simple solutions to the discrimination problem, and show why

these straightforward approaches do not always solve the problem. Section 4.5

then concludes the chapter by giving an overview of the research problems and

challenges in discrimination-aware data mining and connects them to the other

chapters in this topic.

We would like to stress that all examples in this chapter are purely fictitious;

they do not represent our experiences with discrimination in real life, or our belief

of where these processes are actually happening. Instead this chapter is a purely

mechanical study of how we believe such processes occur.

3.2 Characterization of the Computational Modeling Process

Computational models are mathematical models that predict an outcome from cha-

racteristics of an object. For example, banks use computational models (classifi-

ers) for credit scoring. Given characteristics of an individual, such as age, income,

credit history, the goal is to predict whether a given client will repay the

loan. Based on that prediction a decision whether to grant a credit is made. Banks

build their models using their historical databases of customer performance. The

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home