The Discovery of Discrimination - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

items, denoting groups of people that could be potentially discriminated. Given a

classification rule SEX = FEMALE , CAR = OWN

CREDIT = NO , it is straightforward

to separate in its premise SEX = FEMALE from CAR = OWN , in order to reason about

potential discrimination against women with respect to people owning a car.

However, discrimination typically occurs for subgroups rather than for the whole

group (the US courts coined the term “gender-plus allegations” to describe con-

ducts breaching the law on the ground of sex-plus-something-else), or it may occur

for multiple causes (called multiple discrimination in ENAR, 2007). For instance,

we could be interested in discrimination against older women. With our syntax, this

group would be represented as the itemset SEX = FEMALE , AGE = OLDER . The inter-

section of two disadvantaged minorities (here, SEX = FEMALE and AGE = OLDER )is

a, possibly empty, smaller (even more disadvantaged) minority as well. As a con-

sequence, we generalize the notion of potentially discriminatory item to the one

of potentially discriminatory (PD) itemset , and assume that the downward closure

property holds for PD itemsets (Ruggieri et al., 2010a).

→

Definition 1. If A 1 and A 2 are PD itemsets, then A 1 ,

A 2 is a PD itemset as well.

On the technical side, the downward closure property is a sufficient condition for

separating PD itemsets in the premise of a classification rule, namely, there is only

one way A

B of splitting the premise of a rule into a PD itemset A and a PND

itemset B .

Definition 2. A classification rule A

C is called potentially discriminatory (PD

rule) if A is non-empty, and potentially non-discriminatory (PND rule) otherwise.

→

PD rules explicitly state conclusions involving potentially discriminated groups. PD

rules cannot be extracted from datasets that do not contain potentially discriminatory

items. In such a case, PND rules can still indirectly unveil discriminatory practices

(see Section 5.4).

Let us consider now how to quantitatively measure the “burden” imposed on such

groups and unveiled by a discovered PD rule. Unfortunately, there is no uniformity

nor general agreement on a standard quantification of discrimination by legisla-

tions. A general principle mentioned by (Knopff, 1986) is to consider group under-

representation as a quantitative measure of the qualitative requirement that people

in a group are treated “less favorably” (see European Union Legislation, 2011; U.K.

Legislation, 2011) than others, or such that “a higher proportion of people without

the attribute comply or are able to comply” (see Australian Legislation, 2011) to a

qualifying criterium. We recall from (Ruggieri et al., 2010a) the notion of extended

lift 1 , a measure of the increased confidence in concluding an assertion C resulting

from adding (potentially discriminatory) information A to a rule B

→

C where no

PD itemset appears.

The term “extended lift” originates from the fact that it conservatively extends the well-

known measure of lift (or interest factor ) of an association rule (Tan et al., 2004), which

is obtained, as a special case, when B empty. Conversely, the extended lift of A , B → C

corresponds to the lift of A → C over the set of transactions supporting B .

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home