Association Rules - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

Table 12.3 shows the essential differences among the different methods and

groups together related algorithms. As you can see, there are only a handful of

distinct paradigms. The other algorithms propose optimizations over these basic

cases. For example, PEAR, PDM, NPA, FDM, FPM, and CCPD are all similar to

Count Distribution. Likewise, SPA, IDD, and PCCD are similar to Data

Distribution, whereas HPA and HPAELD are similar to Candidate Distribution.

Hybrid Distribution combines Count and Data Distribution techniques. ParEclat,

ParMaxEclat, ParClique, and ParMaxClique are all based on their sequential

counterparts. Finally, APM is based on the sequential DIC method, and PPAR is

based on Partition.

12.7.1 Count Distribution Algorithm

Count Distribution(CD) is a parallel version of Apriori. It is one of the earliest

proposed and representative parallel algorithms for mining of association

rules(Agrawal et al, 1996) . We describe here briefly its steps. The database

D

is

partitioned into

D 1 , D 2 , …, D n and distributed across

n

processors. The program

fragment of CD at processor

p i , 1≤

i ≤

n

, for the

k

-th iteration is outlined. For

convenience, we use

X

.sup (i) to represent the local support count of an itemset

X

in partition

D i . In step 1, every processor computes the same candidate set

C

k by

applying the aprior_gen function on

L k -1 , which is the set of large itemsets found

at the (

k

-1)-th iteration. In step 2, local support counts (support in

D i ) of

candidates in

C k are found. In steps 3 and 4, local support counts are exchanged

with all other processors to get global support counts (support in

D

) and globally

large itemsets (large with respect to

L k are computed independently by each

processor. CD repeats steps 1-4 until no more candidate is found.

D

)

Algorithm 12.9 Count Distribution algorithm

1. C k = apriori_gen( L k -1 );

2. scan partition D i to find the local support count X . sup ( i ) for all X ∈ C k ;

3. exchange { X . sup ( i )| X ∈ C k } with all other processors to get global support

counts X . sup ( i )

for all X ∈ C k ;

4. L k = { X ∈ C k | X . sup ( i ) ≥ minsup × |D|}.

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home