Database Reference
In-Depth Information
The difference between the DRP and IRP methods shown in Tables 1 and 3 is
about the set of records chosen for transformation. As shown in Table 3, in IRP
the chosen records should not satisfy the D itemset (chosen records are those with
~A, B,~D→~C ), whereas DRP does not care about D at all (chosen records are
those with ~A, B →~ C ).
13.5 Measuring Discrimination Removal
Discrimination prevention methods should be evaluated based on two aspects: dis-
crimination removal and data quality. We deal with the first aspect in this section:
how successful the method is at removing all evidence of direct and/or indirect
discrimination from the original dataset. To measure discrimination removal, four
metrics were proposed in Hajian et al. (2011a and 2011b) and Hajian and Domin-
go-Ferrer (2012):
Direct Discrimination Prevention Degree (DDPD). This measure quantifies
the percentage of discriminatory rules that are no longer discriminatory in the
transformed dataset.
Direct Discrimination Protection Preservation (DDPP) . This measure quan-
tifies the percentage of the protective rules in the original dataset that remain
protective in the transformed dataset.
Indirect Discrimination Prevention Degree (IDPD). This measure quantifies
the percentage of redlining rules that are no longer redlining in the transformed
dataset.
Indirect Discrimination Protection Preservation (IDPP). This measure
quantifies the percentage of non-redlining rules in the original dataset that re-
main non-redlining in the transformed dataset.
Since the above measures are used to evaluate the success of the proposed
methods in direct and indirect discrimination prevention, ideally their value should
be 100%.
13.6 Measuring Data Quality
The second aspect to evaluate discrimination prevention methods is how much in-
formation loss ( i.e. data quality loss) they cause. To measure data quality, two me-
trics are proposed in Verykios and Gkoulalas-Divanis (2008):
Misses Cost (MC). This measure quantifies the percentage of rules among
those extractable from the original dataset that cannot be extracted from the
transformed dataset (side-effect of the transformation process).
Ghost Cost (GC). This measure quantifies the percentage of the rules among
those extractable from the transformed dataset that were not extractable from
the original dataset (side-effect of the transformation process).
Search WWH ::




Custom Search