Database Reference
In-Depth Information
and reveal information about the ethnicity of a person. As we will see in Chapter 8
on explainable and non-explainable discrimination, however, it is not at all easy to
measure and assess such possibilities for indirect discrimination in practical cases.
The legal review in Chapter 4 shows that our definition of discrimination is in line
with current legislation forbidding direct as well as indirect discrimination. Article
2 of Directive 2000/43/EC by the European commission explicitly deals with indi-
rect discrimination: “ indirect discrimination shall be taken to occur where an ap-
parently neutral provision, criterion or practice would put persons of a racial or
ethnic origin at a particular disadvantage compared with other persons, unless
that provision, criterion or practice is objectively justified by a legitimate aim and
the means of achieving that aim are appropriate and necessary .”
3.3.2 Scenario 1: Incorrect Labels
In this scenario the labels do not accurately represent the population that we are
interested in. In many cases there is a difference in the labels in the training data
and the labels that we want to predict on the basis of test data.
The labels in the historical data are the result of a biased and discriminative
decision making process. Sample selection bias exists when, instead of simply
missing information on characteristics important to the process under study, the
researcher is also systematically missing subjects whose characteristics vary
from those of the individuals represented in the data (Blank et al, 2004). For
example, an employment bureau wants to implement a module to suggest suit-
able jobs to unemployed people. For this purpose, a model is built based upon
historical records of former applicants successfully acquiring a job by linking
characteristics such as their education and interests to the job profile. Suppose,
however, that historically women have been treated unfairly by denying higher
board functions to them. A data mining model will pick up this relation be-
tween gender and higher board functions and use it for prediction.
Labeling changes in time . Imagine a bank wanting to make special offers to its
more wealthy customers. For many customers only partial information is avail-
able, because, e.g., they have accounts and stock portfolios with other banks as
well. Therefore, a model is learned that, based solely upon demographic cha-
racteristics, decides if a person is likely to have a high income or not. Suppose
that one of the rules found in the historical data states that, overall, men are
likely to have a higher income than women. This fact can be exploited by the
classifier to deny the special offer to women. Recently, however, gender equali-
ty programs and laws have resulted in closing the gender gap in income, such
that this relation between gender and income that exists in the historical data is
expected to vanish, or at least become less apparent than in the historical data.
For instance, the distance Learning Center (2009) provides data indicating the
earning gap between male and female employees. Back in 1979 women earned
59 cents for every dollar of income that men earned. In 2009 that figure has ris-
en to 81 cents for every dollar of income that men earned. In this example, the
Search WWH ::




Custom Search