Information Technology Reference
In-Depth Information
Table 12.3 shows the essential differences among the different methods and
groups together related algorithms. As you can see, there are only a handful of
distinct paradigms. The other algorithms propose optimizations over these basic
cases. For example, PEAR, PDM, NPA, FDM, FPM, and CCPD are all similar to
Count Distribution. Likewise, SPA, IDD, and PCCD are similar to Data
Distribution, whereas HPA and HPAELD are similar to Candidate Distribution.
Hybrid Distribution combines Count and Data Distribution techniques. ParEclat,
ParMaxEclat, ParClique, and ParMaxClique are all based on their sequential
counterparts. Finally, APM is based on the sequential DIC method, and PPAR is
based on Partition.
12.7.1 Count Distribution Algorithm
Count Distribution(CD) is a parallel version of Apriori. It is one of the earliest
proposed and representative parallel algorithms for mining of association
rules(Agrawal et al, 1996) . We describe here briefly its steps. The database
D
is
partitioned into
D 1 , D 2 , …, D n and distributed across
n
processors. The program
fragment of CD at processor
p i , 1≤
i
n
, for the
k
-th iteration is outlined. For
convenience, we use
X
.sup (i) to represent the local support count of an itemset
X
in partition
D i . In step 1, every processor computes the same candidate set
C
k by
applying the aprior_gen function on
L k -1 , which is the set of large itemsets found
at the (
k
-1)-th iteration. In step 2, local support counts (support in
D i ) of
candidates in
C k are found. In steps 3 and 4, local support counts are exchanged
with all other processors to get global support counts (support in
D
) and globally
large itemsets (large with respect to
L k are computed independently by each
processor. CD repeats steps 1-4 until no more candidate is found.
D
)
Algorithm 12.9 Count Distribution algorithm
1. C k = apriori_gen( L k -1 );
2. scan partition D i to find the local support count X . sup ( i ) for all X C k ;
3. exchange { X . sup ( i )| X C k } with all other processors to get global support
counts X . sup ( i )
for all X C k ;
4. L k = { X C k | X . sup ( i ) minsup × |D|}.
Search WWH ::




Custom Search