Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

i = 1 |

C x .sup i ≥

∗

(

U i |

)

i = 1

( C x .sup i −

∗|

U i |

)

≥

i = 1

Therefore, checking for support is equivalent to checking if i = 1 ( C x .sup i −

∗

. This is

accomplished by first computing the sum securely, and applying a secure comparison

at the end.

The first site generates a random number x r for each itemset C x , adds that number

to its ( C x .sup i −

U i |

)

≥

0. The challenge is to do this without revealing C x .sup i or

U i |

∗|

U i |

), and sends it to the next site. (All arithmetic is modm ,

where m

, for security purposes.) The random number masks the actual

excess support, so the second site learns nothing about the first site's actual database

size or support. The second site adds its excess support and sends the value on. The

random value now hides both support counts. The last site in the chain now has

≥

∗|

i = 1 ( C x .sup i −

x r (mod m ).

Since the total database size is

∗|

U i |

)

m/ 2, negative summation will be mapped

to some number that is bigger than (or equal to) m/ 2. (

|≤

k mod m .) The

last site needs to test if this sum minus x r modm is less than m/ 2. This can be done

securely using Yao's generic method [ 56 ]. Clearly this algorithm is secure as long

as there is no collusion, as no site can distinguish what it receives from a random

number. Alternatively, the first site can simply send x r to the last site. The last site

learns the actual excess support, but does not learn the support values for any single

site. In addition, if we consider the excess support to be a valid part of the global

result, this method is still secure.

The above basic protocol can be extended to provide privacy in the context

of collusions [ 26 ]. In addition, efficiency could be improved by using fast union

protocols [ 51 ].

−

4.2

Vertically Partitioned Data

The FDM algorithm that was described above can be modified to address the case of

vertically partitioned data. First of all, each site can compute all the locally frequent

itemsets that can be supported based on the items belonging to the local site. Later

on, to check whether an itemset is globally frequent, vertically partitioned data need

to be combined in order to compute the T ∈ U j = 1 I C x i j ⊆ T , where I C x i j ⊆ T is

the indicator function that represents whether transaction T

∈

U contains itemset

C x i j or not.

In the context of two parties, the above equation becomes a simple dot product. For

example, suppose two parties wish to determine if an itemset C x has the minimum

support in the data set, but neither party has data on the entire set C x . Instead, they

Frequent Pattern Mining

Search WWH ::

Custom Search

Home