Database Reference
In-Depth Information
Ta b l e 5 . 1
The German credit case study: attributes (top) and an excerpt of the dataset
(bottom)
Attributes
on personal properties:
checking account status, duration, savings status, property
magnitude, type of housing
on credits:
credit history, credit request purpose, credit request amount, installment
commitment, existing credits, other parties, other payment
on employment:
job type, employment since, number of dependents, own telephone
on personal status:
personal status and gender, age, resident since, foreign worker
Decision
CLASS
, with values
GOOD
(grant credit) and
BAD
(deny credit)
Potentially discriminatory (PD) items
PERSONAL STATUS
=
FEMALE
(female)
AGE
=
GT
52
(senior people)
FOREIGN WORKER
=
YES
(foreign workers)
PURPOSE CREDIT AMNT HOUSING
...
CLASS
PERS STATUS
AGE
JOB
female
gt 52
self emp
new car
lt 38 k
rent
...
bad
male married 30 to 41
unemp
used car
39k to 75 k
own
...
good
male single
42 to 51
skilled
business
75k to 111k
for free
...
good
female
gt 52
unemp
furniture
lt 38 k
own
...
bad
...
...
...
...
...
...
...
...
5.2
Classification Rules for Discrimination Discovery
As a running example throughout the chapter, we refer to the public domain Ger-
man credit dataset, publicly available from the UCI repository of machine learning
datasets (Newman, Hettich, Blake, & Merz, 1998). The dataset consists of 1000
records over bank account holders. It includes 20 nominal (or discretized) attributes
as shown in Table 5.1. The decision attribute takes values representing the good/bad
creditor classification of the bank account holder.
5.2.1
Classification Rules
Given a relation with
n
attributes, we refer to an
item
as an expression
a
v
,where
a
is an attribute and
v
one of its possible values. For example
PERSONAL STATUS
=
MALE SINGLE
is an item for the German credit dataset. One of the attributes is
taken as the class attribute, i.e., the attribute referring to the decision. In our running
example, the class is named
CLASS
and the two possible items are
CLASS
=
GOOD
,
that is credit is granted, and
CLASS
=
BAD
, that is credit is denied.
A
transaction T
is a set of items, one for each attribute of the relation. Intuitively,
a transaction is the set of items corresponding to a row of a table. By an
itemset
X
we
mean a set of items, and we say that a transaction
T supports
an itemset
X
if every
item in
X
belongs to
T
as well, in symbols
X
=
T
. As an example, the transaction
corresponding to the first row in Table 5.1 supports the itemset
PERSONAL STATUS
⊆
Search WWH ::
Custom Search