Techniques for Discrimination-Free Predictive Models - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

not remove the age-discrimination, as many other attributes such as, own house ,

indicating if the applicant is a home-owner, turn out to be good predictors for age .A

parallel can be drawn with the practice of redlining : denying inhabitants of particular

racially determined areas from services such as loans. It describes the now abolished

practice of marking a red line on a map to delineate the area where banks would not

invest; later the term was used for indirect discrimination against a particular group

of people (usually by race or sex) no matter the geography 1 .

12.2.2

Measuring Discrimination

There are many different ways in which discrimination could be quantified, and

each of them has its own advantages and disadvantages. Here, in this chapter, and

in our earlier works (Calders et al., 2009; Kamiran & Calders, 2010; Kamiran et

al., 2010b; Kamiran & Calders, 2009a,b; Kamiran et al., 2010a; Calders & Verwer,

2010), we define the level of discrimination in a dataset as the difference between

the probability that someone from the favored group gets a positive class and the

probability that someone from the deprived community gets a positive class. For

alternative measures of discrimination, see Chapters 5 and 6 of this topic.

For the running example of Table 12.1, the discrimination with respect to the de-

prived community Sex = female is 4/5 - 2/5 = 40%. Formally, for a sensitive attribute

S , deprived community (sensitive attribute value) f , favored community m , the dis-

crimination in D with respect to the group S

f , denoted disc S = f (

)

,isdefined

as:

= |{

∈

(

Class

)=+ }|

disc S = f (

)

∈

(

− |{

∈

(

Class

)=+ }|

∈

(

When measuring the discrimination of a classifier, we want to assess how the classi-

fier will act on new, previously unseen examples. We assume a setting in which one

example comes at a time, and the classifier needs to assign a label to them immedi-

ately. In order to assess the level of discrimination of the classifier when it would be

applied to unseen examples, we use a test-set; that is, following standard machine

learning practice, before learning a classifier, we split the dataset in two parts; one

for learning the classifier, and one for measuring its quality. The examples of the

test-set (with their labels removed) are passed one by one to the classifier and its de-

cisions are recorded. After that, the discrimination of the classifier can be assessed

as follows. The discrimination of the classifier C with respect to the group S

f on

a test dataset D test , denoted disc S = f (

D test )

,isdefinedas:

Source: http://en.wikipedia.org/wiki/redlining , November 17th, 2011.

Discrimination and Privacy in the Information Society

Search WWH ::

Custom Search

Home