The Discovery of Discrimination - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

So far, we have assumed that discriminatory items are recorded in the source

data. This is not always the case, e.g., race may be not available or even collectable.

What if the discriminatory variables are not directly available? In this case, in-

direct discrimination may occur. Consider the rule ZIP =10451, CITY =NYC

→

CLASS = BAD , with confidence 0.95, stating that the residents of a given neighbor-

hood of NYC are assigned bad credit with a 95% chance. Apparently, this rule does

not unveil any discriminatory practice. However, assume that the following other

rule can be coded from available information, such as census data: ZIP =10451,

CITY =NYC

RACE = BLACK , with confidence 0.80, stating that 80% of resi-

dents of that particular neighborhood of NYC are black. Then it is possible to

prove a theoretical lower bound of 0.94 for the confidence of the combined rule

ZIP =10451, CITY =NYC, RACE = BLACK

→

CLASS = BAD , stating that 94% of

black people in that neighborhood are assigned bad credit, around 3.7 times the

general population of NYC. This reasoning shows that the original rule unveils a

case of redlining.

Different measures of the discrimination power of the mined decision rules can

be defined, according to the provision of different anti-discrimination regulations:

e.g., the EU Directives (European Union Legislation, 2011) state that discrimination

on a given attribute occurs when “a higher proportion of people without the attribute

comply or are able to comply” (which we will code as the risk ratio measure), while

the US Equal Pay Act (U.S. Federal Legislation, 2011) states that: “a selection rate

for any race, sex, or ethnic group which is less than four-fifths of the rate for the

group with the highest rate will generally be regarded as evidence of adverse impact”

(which we will code as the selection ratio measure).

Our discrimination discovery approach opens a promising avenue for research,

based on an apparently paradoxical idea: data mining, which is typically used to

create potentially discriminatory profiles and classifications, can also be used the

other way round, as a powerful aid to the anti-discrimination analyst, capable of

automatically discovering the patterns of discrimination that emerge from the avail-

able data with the strongest prima facie evidence. The preliminary experiments on

a dataset of credit decisions operated by a German bank show that this method is

able to pinpoint evidence of discrimination: the cited highly discriminatory rule that

“foreign worker women are assigned bad credit among those who intend to buy a

new car” is actually discovered from such a database.

The rest of the chapter is organized as follows. Section 5.2 introduces the tech-

nicalities of classification rules and measures of discrimination defined over them.

Using those tools, we show how the anti-discrimination analyst can go through the

analysis of direct discrimination (Section 5.3), indirect discrimination (Section 5.4),

respondent argumentation (Section 5.5), and affirmative actions (Section 5.6). Some

details on the analytical tool DCUBE, which supports the discrimination discovery

process, are provided in Section 5.7. Finally, we summarize the approach and dis-

cuss some challenging lines for future research.

→

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home