Database Reference
In-Depth Information
= FEMALE , AGE = GT 52 but not PERSONAL STATUS = MALE SINGLE , AGE =
GT 52. A dataset
is a set of transactions. Intuitively, it corresponds to the trans-
actions built from a table.
The support of an itemset X w.r.t.
D
D
is the proportion of transactions in
D
sup-
(
)= |{
∈ D |
}|/|D|
||
porting X : supp
X
T
X
T
,where
is the cardinality operator.
Y ,where X and Y are disjoint itemsets.
X is called the premise and Y is called the consequence of the association rule.
We s a y t h a t X
An association rule is an expression X
Y is a classification rule if Y is a class item. As an example,
PERSONAL STATUS = FEMALE , AGE = GT 52
CLASS = BAD is a classification
rule for the German credit dataset.
The support of X
Y is the support of the itemset obtained by the union of X
and Y , in symbols supp
Y is the union of X and Y . Intuitively, the
support of a rule states how often the rule is satisfied in the dataset. A support of
0.1 for the rule PERSONAL STATUS = FEMALE , AGE = GT 52
(
X
,
Y
)
,where X
,
CLASS = BAD
means that 10% of the transactions support both the premise and the consequence
of the rule, i.e., support PERSONAL STATUS = FEMALE , AGE = GT 52, CLASS =
BAD . The confidence of X
Y , defined when supp
(
X
) >
0, is:
con f
(
X
Y
)=
supp
(
X
,
Y
) /
supp
(
X
) .
Confidence states the proportion of transactions supporting Y among those support-
ing X . A confidence of 0.7 for the rule above means that 70% of the transactions sup-
porting PERSONAL STATUS = FEMALE , AGE = GT 52 also support CLASS = BAD .
Support and confidence range over
. Since the seminal paper by (Agrawal &
Srikant, 1994), many well explored algorithms have been designed for extracting the
set of frequent itemsets, i.e., itemsets with a specified minimum support. A survey
on frequent pattern mining is due to (Han et al. , 2007); a survey on interestingness
measures for association rules is reported by (Geng & Hamilton, 2006); a repository
of implementations is maintained by (Goethals, 2010).
[
0
,
1
]
5.2.2
Measures of Discrimination
A critical problem in the analysis of discrimination is precisely to quantify the de-
gree of discrimination suffered by a given group (say, an ethnic group) in a given
context (say, a geographic area and/or an income range) with respect to a decision
(say, credit denial). We rephrase this problem in a rule based setting: if A is the
condition (i.e., the itemset) that characterizes the group which is suspected of be-
ing discriminated against, B is the itemset that chacterizes the context, and C is the
decision (class) item, then the analysis of discrimination is pursued by studying the
rule A
C , together with its confidence with respect to the underlying decision
dataset - namely, how often such a rule is true in the dataset itself.
Civil rights laws explicitly identify the groups to be protected against discrimina-
tion, e.g., women or black people. With our syntax, those groups can be represented
as items, e.g., SEX = FEMALE or RACE = BLACK . Therefore, we can assume that the
laws provide us with a set of items, which we call potentially discriminatory (PD)
,
B
Search WWH ::




Custom Search