Direct and Indirect Discrimination Prevention Methods - Discrimination and Privacy in the Information Society

Database Reference

In-Depth Information

The difference between the DRP and IRP methods shown in Tables 1 and 3 is

about the set of records chosen for transformation. As shown in Table 3, in IRP

the chosen records should not satisfy the D itemset (chosen records are those with

~A, B,~D→~C ), whereas DRP does not care about D at all (chosen records are

those with ~A, B →~ C ).

13.5 Measuring Discrimination Removal

Discrimination prevention methods should be evaluated based on two aspects: dis-

crimination removal and data quality. We deal with the first aspect in this section:

how successful the method is at removing all evidence of direct and/or indirect

discrimination from the original dataset. To measure discrimination removal, four

metrics were proposed in Hajian et al. (2011a and 2011b) and Hajian and Domin-

go-Ferrer (2012):

• Direct Discrimination Prevention Degree (DDPD). This measure quantifies

the percentage of discriminatory rules that are no longer discriminatory in the

transformed dataset.

Direct Discrimination Protection Preservation (DDPP) . This measure quan-

tifies the percentage of the protective rules in the original dataset that remain

protective in the transformed dataset.

• Indirect Discrimination Prevention Degree (IDPD). This measure quantifies

the percentage of redlining rules that are no longer redlining in the transformed

dataset.

•

Indirect Discrimination Protection Preservation (IDPP). This measure

quantifies the percentage of non-redlining rules in the original dataset that re-

main non-redlining in the transformed dataset.

•

Since the above measures are used to evaluate the success of the proposed

methods in direct and indirect discrimination prevention, ideally their value should

be 100%.

13.6 Measuring Data Quality

The second aspect to evaluate discrimination prevention methods is how much in-

formation loss ( i.e. data quality loss) they cause. To measure data quality, two me-

trics are proposed in Verykios and Gkoulalas-Divanis (2008):

• Misses Cost (MC). This measure quantifies the percentage of rules among

those extractable from the original dataset that cannot be extracted from the

transformed dataset (side-effect of the transformation process).

•

Ghost Cost (GC). This measure quantifies the percentage of the rules among

those extractable from the transformed dataset that were not extractable from

the original dataset (side-effect of the transformation process).

Search WWH ::

Custom Search

Home