Database Reference
In-Depth Information
disability, marital status, genetic features, language and age. It does so in a num-
ber of settings, such as employment and training, access to housing, public servic-
es, education and health care; credit and insurance; and adoption. European efforts
on the non-discrimination front make clear the fundamental importance for
Europe's citizens of the effective implementation and enforcement of non-
discrimination norms. As a recent European Court of Justice case-law on age dis-
crimination suggests, non-discrimination norms constitute fundamental principles
of the European legal order. (See, e.g., Case 144/04 [2005] ECR I-9981 (ECJ),
Judgment of the Court of 22 November 2005, Werner Mangold v Rüdiger Helm;
Case C-555/07 [2010], Judgment of the Court (Grand Chamber) of 19 January
2010, Seda Kücükdeveci v Swedex GmbH & Co. KG.) Therefore it is in the in-
terest of banks, insurance companies, employment agencies, the police and other
institutions that employ computational models for decision making upon individu-
als to ensure that these computational models are free from discrimination. In this
chapter, discrimination is considered to be present if for two individuals that have
the same characteristic relevant to the decision making and differ only in the sen-
sitive attribute (e.g., gender or race) a model results in different decisions.
The main reason that data mining can lead to discrimination is that the compu-
tational model construction methods are often based upon assumptions that turn
out not to be true in practice. For example, in general it is assumed that the data on
which the model is learned follows the same distribution as the data on which the
classifier will have to work; i.e., the situation will not change. In section 4.2 we
elaborate on the implicit assumptions made during classifier construction and
illustrate with fictitious examples how they may be violated in real situations. In
Section 4.3 we move on to show how this mismatch between reality and the as-
sumptions could lead to discriminatory decision processes. We show three types
of problems that may occur: sampling bias, incomplete data, or incorrect labeling.
We show detailed scenarios in which the problems are illustrated. In Section 4.4
we discuss some simple solutions to the discrimination problem, and show why
these straightforward approaches do not always solve the problem. Section 4.5
then concludes the chapter by giving an overview of the research problems and
challenges in discrimination-aware data mining and connects them to the other
chapters in this topic.
We would like to stress that all examples in this chapter are purely fictitious;
they do not represent our experiences with discrimination in real life, or our belief
of where these processes are actually happening. Instead this chapter is a purely
mechanical study of how we believe such processes occur.
3.2 Characterization of the Computational Modeling Process
Computational models are mathematical models that predict an outcome from cha-
racteristics of an object. For example, banks use computational models (classifi-
ers) for credit scoring. Given characteristics of an individual, such as age, income,
credit history, the goal is to predict whether a given client will repay the
loan. Based on that prediction a decision whether to grant a credit is made. Banks
build their models using their historical databases of customer performance. The
Search WWH ::




Custom Search