Database Reference
In-Depth Information
how this leads to discrimination problems. Because of the opportunities presented
by growing amounts of data available for analysis automatic classification gains
importance. Therefore, it is necessary to develop classification techniques that
prevent this unwanted behavior.
Building discrimination free computational models from biased, incorrect or
incomplete data is in its early stages, however, in spite of the fact that a number of
case studies searching for discrimination evidence are available (see e.g. Turner &
Skidmore, 1999). Removing discrimination from computational models is chal-
lenging. Due to incompleteness of data and underlying relations between different
variables it is not sufficient to remove the sensitive attribute or apply separate
treatment to the sensitive groups.
In the last few years several non discriminatory computational modeling tech-
niques have been developed but there are still large challenges ahead: In our view
two challenges require urgent research attention in order to bring non-
discriminatory classification techniques to deployment in applications. The first
challenge is how to measure discrimination in real, complex data with a lot of
attributes. According to the definition, a model is discriminatory if it yields differ-
ent predictions for candidates that differ only in the sensitive attribute and other-
wise are identical. If real application data is complex, it is unlikely for every data
point to find the “identical twin” that would differ only in the value of the sensi-
tive attribute. To solve this problem, legally grounded and sensible from data
mining perspective notions and approximations of similarity of individuals for
non-discriminatory classification need to be established. The second major chal-
lenge is how to find out which part of information carried by a sensitive (or corre-
lated) attribute is sensitive and which is objective, as in the example of a postal
code carrying the ethnicity information and the real estate information. Likewise,
the notions of partial explainability of decisions by individual or groups of
attributes need to be established, and they need to be legally grounded and sensi-
ble from data mining perspective.
References
Blank, R., Dabady, M., Citro, C.: Measuring Racial Discrimination. Natl Academy Press
(2004)
Jonah, B.A.: Accident risk and risk-taking behavior among young drivers. Accident Analy-
sis & Prevention 18(4), 255-271 (1986)
Calders, T., Verwer, S.: Three Naive Bayes Approaches for Discrimination-Free Classifica-
tion. Data Mining and Knowledge Discovery 21(2), 277-292 (2010)
Distance Learning Center. Internet Based Benefit and Compensation Administration: Dis-
crimination in Pay, ch. 26 (2009),
http://www.eridlc.com/index.cfm?fuseaction=textbook.chpt26
(accessed: November 2011)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons
(2001)
Fang, H., Moro, A.: Theories of Statistical Discrimination and Affirmative Action: A Sur-
vey. In: Benhabib, J., Bisin, A., Jackson, M. (eds.) Handbook of Social Economics,
pp. 133-200 (2010)
Search WWH ::




Custom Search