Association Rules - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

3. {scan one more interval on D i and count the supports of the itemsets on the

trie;

4. found out the locally large itemsets among the itemsets on the trie for the interval

scanned;

5. generate new candidates from these locally large itemsets;

6. perform pruning on these candidates and insert the survivors into the trie;

7. remove globally small itemsets on the trie;

8. }

9. }

Every processor in APM performs the dynamic generation and counting. The

new candidates generated by each process go through a pruning similar to the

partition pruning before the survivors are inserted into the trie. Before a

processor starts a new round of counting on the next interval,it traverses the trie

to remove itemsets which are found to be globally small. This will keep the size

of the trie minimum. If all the processors have counted all the itemsets on the trie,

then APM terminates.

12.7.4 Data skewness and workload balance

In a database partition, two data distribution characteristics, data skewness and

workload balance, have orthogonal effects on prunings and hence performance of

FPM. Intuitively, the data skewness of a partitioned database is high if the

supports of most large itemsets are clustered in a few partitions. It is low if the

supports of most large itemsets are distributed evenly across the processors.

David Cheung and Yongqiao Xiao have developed a skewness metric based

on the well established notion of entropy. Given a random variable

, it's entropy

is a measurement on how even or uneven its probability distribution is over its

values. If a database is partitioned over n processors, the value

X

X sup i

.

( )

px i

( )

=

X sup

.

can be regarded as the probability of occurrence of an itemset

X

in partition

D i ,

n

i

Ã

is a measurement of the

1≤

i ≤

n.

The entropy

H(X)

= −

1 (

px i

( )

×

log(

px i

( )))

=

distribution of the local supports of

X

over the partitions For example, if

X

is

skewed completely into a single partition

D

k , (1≤

k ≤

n

) , i.e., it only occurs in Dk,

then

p x (

k

) =1 and

p

x (

i

) =0, ∀ i

¬

k

. The value of H(

X

) = 0 is the minimal in this

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home