Database Reference
In-Depth Information
Table 15.1 A binary dataset along with different partitioning schemes
(a) Original dataset
(b) Vertically partitioned
(c) Horizontally
partitioned
Tr# abcd
R 1 1010
R 2 1010
R 3 0000
Tr# abcd
R 1 1010
R 2 1010
R 3 0000
R 4 0101
R 5 1111
R 6 1101
R 7 0000
Tr# ab
R 1 10
R 2 10
R 3 00
R 4 01
R 5 11
R 6 11
R 7 00
Tr# cd
R 1 10
R 2 10
R 3 00
R 4 01
R 5 11
R 6 01
R 7 00
Tr# abcd
R 4 1101
R 5 0000
R 6 1111
R 7 1111
In the case of vertically partitioned data, shown in Table 15.1 b, we assume that
different sites collect information about the same set of entities, but they collect
different feature sets. For example, both a university pay roll and the university's
student health center may collect information about a student.
In the case of horizontally partitioned data, shown in Table 15.1 c, different sites
collect the same set of information about different entities. For example, different
credit card companies may collect credit card transactions of different individuals.
In context of association rules mining [ 5 ], we may try to mine association rules
on the horizontally partitioned data and/or vertically partitioned data. In the case of
horizontally partitioned data, the traditional ARM problem can be stated as follows.
Consider a set of sites S . Each site S i (1
i n ) has a private transaction database
U i where the entire database U is assumed to be of the form U = U 1 U 2 ∪···∪
U n . The itemset C x has local support count of C x .sup i at site S i , if and only if
C x .sup i of the transactions contain C x . The global support count of C x is given
as C x .sup
= i = 1 C x .sup i . An itemset C x is globally supported if C x .sup
× i = 1 |
U i | . Similarly, the global confidence of a rule C x
s
C y can be given
as C x
C y .sup/C x .sup .
In the case of vertically partitioned data, each U i that resides in site S i con-
tains a subset of the columns that represents different items. To compute C x .sup ,
where C x
, we need to somehow combine those columns. If C x is vertically
partitioned such that sites S i 1 ...S i k hold the information about the items that form
C x (i.e., C x
I
=
C x i 1
C x i 2
...C x i k ), to compute C x .sup we need to compute
T U j = 1 I C x i j T , where I C x i j T is the indicator function that represents
whether transaction T U contains itemset C x i j or not.
The main challenge arises if those databases U i belong to different organizations
and direct sharing of U i is not feasible due to privacy concerns. For example, different
credit card companies may not be able to share their data due to financial privacy
regulations. Computing association rules without disclosing individual transactions
is straightforward in the case of horizontally partitioned data. For example, we
can compute the global support and confidence of an association rule C x C y
C z
Search WWH ::




Custom Search