Information Technology Reference
In-Depth Information
12.7.2 Fast Parallel Mining Algorithm
David Cheung and Yongqiao Xiao recently proposed a parallel version of FDM,
called Fast Parallel Mining (Chenug, 1999). The problem with FDM's polling
mechanism is that it requires two rounds of messages in each iteration: one for
computing the global supports and one for broadcasting the frequent itemsets.
This two-round scheme can degrade performance in a parallel setting. FPM
generates fewer candidates and retains the local and global pruning steps. But
instead of count polling and subsequent broadcast of frequent itemsets, it simply
broadcasts local supports to all processors. The more interesting aspect of this
work is a metric Cheung and Xiao define for data skewness (the distribution of
itemsets among the various partitions). For an itemset X, let pX(i) denote the
probability that X occurs in partition i. The entropy of X is given as
n
= − Ã
H X
(
)
pX i
( ) log(
pX i
( ))
(12.8)
i
The entropy easures the distribution of the local support counts of X among
all partitions. FPM is an enhancement of CD. The simple support counts
exchange scheme in CD is retained in FPM. The main difference is the
incorporation of both the distributed and global prunings in FPM to reduce the
candidate set size. In FPM the first iteration is the same as CD. Each processor
scans its partition to find out local support counts of all size-1 itemsets and use
one round of count exchange to compute the global support counts. At the end of
the 1-st iteration, in addition to L 1 , each processor also finds out the gl-large
itemsets GL 1 (
i
), for 1≤
i
n
. For the
k
-th iteration of FPM,
k
>1, the program
fragment executed at processor
i
, 1≤ i
n
, is described as Algorithm 12.10.
Algorithm 12.10 FPM algorithm.
1. compute candidate sets
n
CG
= G
GL
1( ) )
1 apriori_gen(
apriori_gen(
apriori_gen( ; (distributed
k
i
=
k
i
pruning)
2. prune candidates in CGk by global pruning
3. scan partition
D i to find the local support count
X
.sup(
i
) for all remaining
candidates
X CGk
4. exchange {
X
.sup(
i
)|
X CGk } with all other processors to get global
support counts
X
.sup(
i
) for all
X CGk
Search WWH ::




Custom Search