Association Rules - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

Following shows the local pruning procedure of candidate sets.

1. Candidate sets generation: Generate the candidate sets CG i

according to

k

global large itemsets found at (

k

-1)-st iteration at site Si using formula CG i

=

k

Ariori_genGL

.

2. Local pruning: For each

k-1

X ∈ CG i

, scan each local database DB i to compute

k

local support total count

X

.sup i . If

X

at site

S

i isn't local large, it will be deleted

from candidate data sets

LL i

. This pruning only deletes

X

from candidate data

k

sets of site

may present at candidate data sets of other sites).

3. Support count exchange: Broadcast the candidate sets in

S

i ,

X

toward other

sites, to collect support count. Compute global support count, and obtain all

global large

LL i

k

i .

4. Broadcast the results of mining: Broadcast all global large

k

-data sets in site

S

k-

itemsets

computed toward other sites.

In the above processes for finding globally large candidate itemsets, step 2

“local pruning” and step 3 “support count exchange”, each site

S i should has two

support count sets. For local pruning,

S i should find out the local support counts

of its local candidate sets CG i(k) . For support count exchange,

S i should find out

different local support count with global support count sets in other sites. A

simple approach would be to scan two times, firstly find out local support count

basing on local candidate CG i(k) ; secondly respond to quests of support count

from other sites. However, this would substantially degrade the performance.

In fact, it is unnecessary to scan database two times. At site

i , before k-th

iteration, CG i(k) was obtained, some other relative sets also could be obtained. For

example, CG j(k) (

S

j

=1,…,

n

,

j ≠ i

). Because all GL i

i

=1,…,

n were broadcasted

k-1

toward each site,and candidate GL i

i

=1,…,

n could be computed from

k-1

corresponding GL i

-1)-st iteration. That is, because all global large

data sets created by previous iteration were broadcast toward all sites, at the

beginning of iteration.each site could compute candidate itemsets in other

sites.Therefore, local support count of all candidate itemsets could be obtained in

a scan, and be stored in structure similar to Apriori algorithm Hash-tree. Two

different support count sets, can be searched in this structure, in the exchange of

local pruning and support count.

, after (

k

k-1

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home