Privacy Issues in Association Rule Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

knowing only the local supports of C x C y and C x C y C z , and the size of each database:

sites

1 suppor t _ count C x C y C z ( i )

sites

i = 1

suppor t C x C y ⇒ C z =

database _ size ( i )

sites

i = 1 suppor t _ count C x C y ( i )

sites

suppor t C x C y

1 database _ size ( i )

suppor t C x C y ⇒ C z

suppor t C x C y

conf idence C x C y ⇒ C z =

The above approach protects individuals' data privacy, but it does require that

each site discloses what rules it supports, and how much it supports each potential

global rule. What if this information is sensitive? To address these challenges, cryp-

tographic techniques have been used to develop privacy-preserving distributed ARM

techniques. Below, we provide an overview of basic protocols that have been applied

in different data partitioning scenarios.

4.1

Horizontally Partitioned Data

To construct a privacy-preserving ARM algorithm for horizontally partitioned data,

several cryptographic sub-protocols may need to be used. Before, we summarize the

algorithm proposed in [ 26 ] for three or more parties 1 , in what follows we discuss a

fast algorithm that has been proposed for distributed mining of association rules.

A fast algorithm for distributed ARM is given in Cheung et. al.[ 13 ]. Their

procedure for Fast Distributed Mining of association rules (FDM) is summarized

below.

The set of frequent itemsets F ( k ) consists of all k -itemsets that are globally

supported. The set of locally frequent itemsets, LF i ( k ) , consists of all k -itemsets

supported locally at site S i . GF i ( k )

LF i ( k ) is the set of globally frequent

k -itemsets locally supported at site S i . The aim of distributed ARM is to find the

sets F ( k ) , for all k> 1, and the support counts for these itemsets, and from this to

compute association rules with the specified minimum support and confidence.

F ( k ) ∩

1 Candidate Sets Generation : Generate candidate sets CG i ( k ) based on GF i ( k − 1) ,

itemsets that are supported by the S i at the (k-1)-th iteration, using the classic a-

priori candidate generation algorithm. Each site generates candidates based on the

intersection of globally frequent (k-1) itemsets and locally frequent (k-1) itemsets.

2 Local Pruning : For each C x

CG i ( k ) , scan the database U i at S i to compute

C x .sup i .If C x is locally frequent at S i , it is included in the LF i ( k ) set. Please note

that if C x is supported globally, it will be supported in one site.

∈

Please see the two party case discussion given in [ 26 ].

Frequent Pattern Mining

Search WWH ::

Custom Search

Home