Association Rules - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

record and reduceing the total number of record. There is the concept of negative

boundary in sampling algorithm. It utilizes the downward closeness of frequent

item sets. Suppose there is a downward close power set of an itemset SI, if all

subset of the itemset

itself,

then the set will be an element of the negative boundary. All elements with these

attributes form negative boundary, which marked as Bd(

are subsumed by

and the set is not subsumed by

). For example, if

A,B,C,D,E

} S={{

},{

}, {

A,B

},{

A,C

},{

A,E

},{

C,E

},{

A,C,E

}}, then

Bd(

)={{

B,C

},{

B,E

},{

}}.

is the superset of itemsets with high frequency if

none of the sets in Bd(

) with high frequency.

For database DB in Figure 12.1, we can get the candidate itemsets

1 when

we count the number of every item by scanning the whole sets of transaction in

the first scanning of database. If the minum support is 2, then the single

dimension of itemsets

1 is produced. Because there is the minium support of any

subset of large itemsets, Apriori calculates the

2 using

L 1 *L 1 . The operation * is

defined as

1 *

2 ={

X ∪ Y

Y ∈ L k, |

k −1}

Then the candidate itemsets

2 is produce and

2 is produced from the minium

support. From

2 to

3 , the two itemsets which have the same head item: {

}

and {

} will be confirmed. Then the rear itemsets {

} and {

}, which are

defined as {

} are tested if they are satisfied with the minium support. Because

they are satisfied with the minium support, all the two dimension subsets of

{

} is a candidate itemset. Because there isn't

any three dimension candidate itemsets from

BCE

} are large itemsets, so {

BCE

3 is gained.

Then there are no higher dimension itemsets any more. So the whole large

itemsets are confirmed.

Except Apriori algorithm, sampling algorithm and DIC algorithm also are

classical association rules mining algorithm. The main idea of sampling

algorithm is to define a value lowsup which is less than minsup and to calculate

the itemsets which Support(db(

2 ,

3 is confirmed and

)) lowsup on sampling data with the help of

algorithm of sampling. This itemsets is marked as

. Suppose

is the super set of

frequent itemsets. Negative boundary Bd(

) is calculated. Scanning DB and

and the support of Bd(

) are calculated. If no set in Bd(

) is frequent itemsets,

then

is the super set of frenquent itemsets. Otherwise failure is reported and

frequent itemsets in Bd(

) is

calculated and DB is scanned and so on until no frequent itemsets can be added

into S.

) are added into

. The negatice boundary of Bd(

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home