Database Reference
In-Depth Information
all the individuals as if they earn the same and thus lose the opportunity to im-
prove upon accuracy for people with very high and very low income stability.
If the two assumptions are satisfied, it is reasonable to expect that models will
transfer the knowledge from the historical data to the future decision making. On
the other hand, however, if the historical data is prejudiced, the models trained on
this data can be expected to yield prejudiced decisions. As we will see in the fol-
lowing subsection the assumptions may not hold in reality due to the origins of da-
ta. If the i.i.d. assumptions are not satisfied, the computational models built in
such settings might still be valid; however, possible effects of these breaches need
to be taken into account when interpreting the results.
3.2.2 Origins of Training Data
In order to identify the sources of possible discrimination in trained models we
need to analyze the origins and the characteristics of the training data.
Data Collection
First of all, the data collection process may be intentionally or unintentionally bi-
ased. For instance, Turner & Skidmore (1999) discuss different stages of the
mortgage lending process that potentially may lead to racial discrimination. Ad-
vertising and promotions can be sent to selected neighborhoods. Pre-application
consultancy may be offered on a biased basis. These actions may lead to a situa-
tion when the historical database of applicants does not represent the potential
clients. Other examples of biased data collection include racial profiling of crime
suspects or selecting people for further security checks at airports. If people of
particular ethnic backgrounds are stopped for searches more often, even if they
were never convicted for carrying forbidden items, the historical database will
contain a skewed representation of a population.
Relations between Attributes in Data
Second, the attributes that characterize our subjects may not be independent from
each other. For example, a postal code of a person may be highly correlated with
ethnicity, since people may tend to choose to live close to relatives, acquaintances
or a community (see Rice, 1996 for more examples in lending). A marital status
may be correlated with gender, for instance, the statuses as “wife” or “husband”
directly encode gender, while “divorced” does not relate to gender.
If the attributes are independent, every attribute contributes its separate share to
the decision making in the model. If variables are related to each other, it is not
straightforward to identify and control which variable contributes to what extent to
the final prediction. Moreover, it is often impossible to collect all the attributes of
a subject or take all the environmental factors into account with a model. There-
fore our data may be incomplete , i.e., missing some information and some hidden
information may be transferred indirectly via correlated attributes.
Search WWH ::




Custom Search