Database Reference
In-Depth Information
Metrics to measure the success at removing discriminatory rules are given in Sec-
tion 13.5. Data quality metrics are listed in Section 13.6. Section 13.7 contains
experimental results for the direct discrimination prevention methods proposed.
Conclusions and suggestions for future work are summarized in Section 13.8.
13.2 Preliminaries
In this section we briefly recall some basic concepts which are useful to better un-
derstand the study presented in this chapter.
13.2.1 Basic Notions
•
A
dataset
is a collection of data objects (records) and their attributes. Let
DB
be
the original dataset.
•
An
item
is an attribute along with its value,
e.g.
{Race=black}.
•
An
itemset
,
i.e.
X
, is a collection of one or more items,
e.g.
{Foreign work
er=Yes, City=NYC}.
•
A
classification rule
is an expression
X
C
, where
C
is a class item (a yes/no
decision), and
X
is an itemset containing no class item, e.g. {Foreign work-
er=Yes, City=NYC}
→
→
{hire=no}.
X
is called the premise of the rule.
•
The
support
of an itemset,
supp(X)
, is the fraction of records that contain the
itemset
X
. We say that a rule
X
→
C
is
completely supported
by a record if both
X
and
C
appear in the record.
•
The
confidence
of a classification rule,
conf(X
C)
, measures how often the
class item
C
appears in records that contain
X
. Hence, if
supp(X)
> 0
→
C) =
supp(X,C)
supp(X)
conf(X
→
1.
Support and confidence range over
[0,1]
.
•
A
frequent classification rule
is a classification rule with a support or confi-
dence greater than a specified lower bound. Let
FR
be the database of frequent
classification rules extracted from
DB
.
•
Discriminatory attributes and itemsets (protected by law):
Attributes are classi-
fied as discriminatory according to the applicable anti-discrimination acts
(laws). For instance, U.S. federal laws prohibit discrimination on the basis of
the following attributes: race, color, religion, nationality, sex, marital status,
age and pregnancy (Pedreschi
et al.
2008). Hence these attributes are regarded
as discriminatory and the itemsets corresponding to them are called discrimina-
tory itemsets. {Gender=Female, Race=Black} is just an example of a discrimi-
natory itemset. Let
DA
s
be
the set of predetermined discriminatory attributes in
DB
and
DI
s
be the set of predetermined discriminatory itemsets in
DB.
•
Non-discriminatory
attributes and
itemsets
: If
A
s
is the set of all the attributes
in
DB
and
I
s
the set of all the itemsets in
DB,
then
nDA
s
(
i.e.
set of
Search WWH ::
Custom Search