Database Reference
In-Depth Information
knowing only the local supports of C x C y and C x C y C z , and the size of each database:
sites
i
1 suppor t _ count C x C y C z ( i )
sites
i = 1
=
suppor t C x C y C z =
database _ size ( i )
sites
i = 1 suppor t _ count C x C y ( i )
sites
i
suppor t C x C y
=
1 database _ size ( i )
=
suppor t C x C y C z
suppor t C x C y
conf idence C x C y C z =
The above approach protects individuals' data privacy, but it does require that
each site discloses what rules it supports, and how much it supports each potential
global rule. What if this information is sensitive? To address these challenges, cryp-
tographic techniques have been used to develop privacy-preserving distributed ARM
techniques. Below, we provide an overview of basic protocols that have been applied
in different data partitioning scenarios.
4.1
Horizontally Partitioned Data
To construct a privacy-preserving ARM algorithm for horizontally partitioned data,
several cryptographic sub-protocols may need to be used. Before, we summarize the
algorithm proposed in [ 26 ] for three or more parties 1 , in what follows we discuss a
fast algorithm that has been proposed for distributed mining of association rules.
A fast algorithm for distributed ARM is given in Cheung et. al.[ 13 ]. Their
procedure for Fast Distributed Mining of association rules (FDM) is summarized
below.
The set of frequent itemsets F ( k ) consists of all k -itemsets that are globally
supported. The set of locally frequent itemsets, LF i ( k ) , consists of all k -itemsets
supported locally at site S i . GF i ( k )
LF i ( k ) is the set of globally frequent
k -itemsets locally supported at site S i . The aim of distributed ARM is to find the
sets F ( k ) , for all k> 1, and the support counts for these itemsets, and from this to
compute association rules with the specified minimum support and confidence.
=
F ( k )
1 Candidate Sets Generation : Generate candidate sets CG i ( k ) based on GF i ( k 1) ,
itemsets that are supported by the S i at the (k-1)-th iteration, using the classic a-
priori candidate generation algorithm. Each site generates candidates based on the
intersection of globally frequent (k-1) itemsets and locally frequent (k-1) itemsets.
2 Local Pruning : For each C x
CG i ( k ) , scan the database U i at S i to compute
C x .sup i .If C x is locally frequent at S i , it is included in the LF i ( k ) set. Please note
that if C x is supported globally, it will be supported in one site.
1
Please see the two party case discussion given in [ 26 ].
 
Search WWH ::




Custom Search