Why Unbiased Computational Processes Can Lead to Discriminative Decision Procedures - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

all the individuals as if they earn the same and thus lose the opportunity to im-

prove upon accuracy for people with very high and very low income stability.

If the two assumptions are satisfied, it is reasonable to expect that models will

transfer the knowledge from the historical data to the future decision making. On

the other hand, however, if the historical data is prejudiced, the models trained on

this data can be expected to yield prejudiced decisions. As we will see in the fol-

lowing subsection the assumptions may not hold in reality due to the origins of da-

ta. If the i.i.d. assumptions are not satisfied, the computational models built in

such settings might still be valid; however, possible effects of these breaches need

to be taken into account when interpreting the results.

3.2.2 Origins of Training Data

In order to identify the sources of possible discrimination in trained models we

need to analyze the origins and the characteristics of the training data.

Data Collection

First of all, the data collection process may be intentionally or unintentionally bi-

ased. For instance, Turner & Skidmore (1999) discuss different stages of the

mortgage lending process that potentially may lead to racial discrimination. Ad-

vertising and promotions can be sent to selected neighborhoods. Pre-application

consultancy may be offered on a biased basis. These actions may lead to a situa-

tion when the historical database of applicants does not represent the potential

clients. Other examples of biased data collection include racial profiling of crime

suspects or selecting people for further security checks at airports. If people of

particular ethnic backgrounds are stopped for searches more often, even if they

were never convicted for carrying forbidden items, the historical database will

contain a skewed representation of a population.

Relations between Attributes in Data

Second, the attributes that characterize our subjects may not be independent from

each other. For example, a postal code of a person may be highly correlated with

ethnicity, since people may tend to choose to live close to relatives, acquaintances

or a community (see Rice, 1996 for more examples in lending). A marital status

may be correlated with gender, for instance, the statuses as “wife” or “husband”

directly encode gender, while “divorced” does not relate to gender.

If the attributes are independent, every attribute contributes its separate share to

the decision making in the model. If variables are related to each other, it is not

straightforward to identify and control which variable contributes to what extent to

the final prediction. Moreover, it is often impossible to collect all the attributes of

a subject or take all the environmental factors into account with a model. There-

fore our data may be incomplete , i.e., missing some information and some hidden

information may be transferred indirectly via correlated attributes.

Search WWH ::

Custom Search

Home