The Discovery of Discrimination - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

of discrimination. While a -protection is defined with reference to elift , its definition

clearly applies to any measure from Figure 5.1. An extension of a -protection to ac-

count for its statistical significance is proposed in (Pedreschi et al., 2009; Ruggieri

et al., 2010c). Also, we refer the reader to (Ruggieri et al. 2010a,2010c) for the pre-

sentation and experimentation of data mining algorithms able to efficiently extract

a -protective classification rules from a large dataset of historical decision records.

Finally, (Pedreschi et al., 2012) show that the choice of a reference measure from

Figure 5.1 has a critical impact on the ranking imposed over the set of PD classi-

fication rules. In other words, selecting a specific discrimination measure is not a

neutral choice, in that it implicitly implies a specific moral criterion to evaluate the

degree of discrimination in a specific context; i.e., different ways to establish how

bad is a discriminatory action. We found it interesting that our quantitative logi-

cal framework for discriminatory rules can help understanding the consequences of

such choices in law and jurisprudence.

5.3

Direct Discrimination Discovery

From this section on, we formalize various legal concepts in discrimination anal-

ysis and discovery as reasonings over the set of extracted classification rules. We

start by considering direct discrimination, which, accordingly to (Ellis, 2005), oc-

curs “where one person is treated less favorably than another”. For the purposes of

making a prima facie evidence in a case before the court, it is enough to show that

only one individual has been treated unfairly in comparison to another. However,

this may be difficult to prove. The complainant may then use aggregate analysis to

establish a regular pattern of unfavorable treatment of the disadvantaged group she

belongs to. This is also the approach that control authorities and internal auditing

may undertake in analysing historical decisions in search of contexts of discrimina-

tion against protected-by-law groups. In direct discrimination, we assume that the

input dataset contains attributes to denote potentially discriminated groups. This is a

reasonable assumption for attributes such as sex and age, or for attributes that can be

explicitly added by control authorities, such as pregnancy status. The next section

will consider the case of attributes not available at all or not even collectable. Under

our assumption, regular patterns of discrimination can then be identified by looking

at PD classification rules of the form:

A

,

B

→

BENEFIT = DENIED

i.e., where the consequent consists of denying a benefit (a loan, school admission, a

job, etc.). Rules of the form above are then screened by selecting/ranking those with

a minimum value of a reference discrimination measure. In terms of Def. 4, we are

then looking for “ a -discrimination of PD classification rules denying benefit”.

As an example, consider our running example dataset and fix the PD items as in

Table 5.1. By ranking classification rules of the form A

CLASS = BAD accord-

ingly to their extended lift measure, we found near the top positions the following:

,

B

→

PERSONAL STATUS = FEMALE , FOREIGN WORKER = YES ,

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home