Information Technology Reference
In-Depth Information
Following shows the local pruning procedure of candidate sets.
1. Candidate sets generation: Generate the candidate sets CG i
according to
k
global large itemsets found at (
k
-1)-st iteration at site Si using formula CG i
=
k
Ariori_genGL
.
2. Local pruning: For each
k-1
X CG i
, scan each local database DB i to compute
k
local support total count
X
.sup i . If
X
at site
S
i isn't local large, it will be deleted
from candidate data sets
LL i
. This pruning only deletes
X
from candidate data
k
sets of site
may present at candidate data sets of other sites).
3. Support count exchange: Broadcast the candidate sets in
S
i ,
X
toward other
sites, to collect support count. Compute global support count, and obtain all
global large
LL i
k
i .
4. Broadcast the results of mining: Broadcast all global large
k
-data sets in site
S
k-
itemsets
computed toward other sites.
In the above processes for finding globally large candidate itemsets, step 2
“local pruning” and step 3 “support count exchange”, each site
S i should has two
support count sets. For local pruning,
S i should find out the local support counts
of its local candidate sets CG i(k) . For support count exchange,
S i should find out
different local support count with global support count sets in other sites. A
simple approach would be to scan two times, firstly find out local support count
basing on local candidate CG i(k) ; secondly respond to quests of support count
from other sites. However, this would substantially degrade the performance.
In fact, it is unnecessary to scan database two times. At site
i , before k-th
iteration, CG i(k) was obtained, some other relative sets also could be obtained. For
example, CG j(k) (
S
j
=1,…,
n
,
j i
). Because all GL i
i
=1,…,
n were broadcasted
k-1
toward each site,and candidate GL i
i
=1,…,
n could be computed from
k-1
corresponding GL i
-1)-st iteration. That is, because all global large
data sets created by previous iteration were broadcast toward all sites, at the
beginning of iteration.each site could compute candidate itemsets in other
sites.Therefore, local support count of all candidate itemsets could be obtained in
a scan, and be stored in structure similar to Apriori algorithm Hash-tree. Two
different support count sets, can be searched in this structure, in the exchange of
local pruning and support count.
, after (
k
k-1
Search WWH ::




Custom Search