Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

where R P ( U ) corresponds to the sensitive rules discovered in the sanitized

dataset U , R P ( U ) to the sensitive rules appearing in the original dataset U and

| X |

is the size of set X . Ideally, the hiding failure should be 0 %.

(b) Misses Cost (MC). This measure quantifies the percentage of the non-restrictive

patterns that are hidden as a side-effect of the sanitization process. It is computed

= | R P ( U )

|−| R P ( U )

| R P ( U )

where R P ( U ) is the set of all non-sensitive rules in the original database U and

R P ( U ) is the set of all non-sensitive rules in the sanitized database U . As one

can notice, there exists a compromise between the misses cost and the hiding

failure, since the more sensitive association rules one needs to hide, the more

legitimate association rules one is expected to miss.

discovered patterns that are artifacts. AF is computed as follows:

P |−|

P |

= |

∩

P |

where P is the set of association rules discovered in the original database U and

P is the set of association rules discovered in U .

(d) Dissimilarity (Diss). The measure of dissimilarity quantifies the difference be-

tween the original and the sanitized datasets by comparing their histograms,

where the horizontal axis contains the items in the dataset and the vertical axis

corresponds to their frequencies. It is calculated as follows:

i = 1 f U ( i ) ×

Diss( U , U )

[ f U ( i )

−

f U ( i )]

i = 1

where f X ( i ) represents the frequency of the i

th item in the dataset X , and n is

the number of distinct items in the original dataset D .

−

The proposed pattern-sharing based metrics are the following:

(a) Side-Effect Factor (SEF). Similarly to the measure of misses cost, the side-

effect factor is used to quantify the amount of non-sensitive association rules

that are removed as an effect of the sanitization process. It is defined as follows:

P |+|

= |

|−

(

R P ( U )

)

SEF

|−|

R P |

(b) Recovery Factor (RF). This measure expresses the possibility of an adversary

to recover a sensitive rule based on the non-sensitive ones. The recovery factor

of a pattern takes into account the existence of its subsets. If all the subsets of a

sensitive rule can be recovered from the sanitized dataset, then the recovery of

Frequent Pattern Mining

Search WWH ::

Custom Search

Home