Background - Association Rule Hiding for Data Mining - page 10

Database Reference

In-Depth Information

support I 2 . An itemset I is called large or frequent in database D, if and only if,

its frequency in D is at least equal to a minimum frequency threshold mfreq, set

by the owner of the data. Equivalently, I is large in D, if and only, sup(I; D)

msup, where msup = mfreqN. The set of frequent itemsets for a database D is

denoted as F D =fI I : freq(I; D). All the itemsets having a frequency lower than

mfreq (equivalently a support lower than msup) are called infrequent and their set is

Ã(I)F D .

The second step in the process of association rule mining typically involves the

identification of all the significant association rules that hold among the derived

frequent itemsets. An association rule I ) J is significant if it holds in database D

with a confidence that is higher than a minimum confidence threshold mconf set by

the owner of the data, i.e. when sup (I[J;D)

sup (I;D) mconf . The support of this rule in D

is equal to that of its generating itemset, i.e. is equal to sup(I[J; D). An example of

association rule mining (borrowed from [72]) is shown on Fig. 2.1.

Rules

Confidence Support

Itemset Support

a

b ) a

100%

4

tid Itemset

T 1

6

b ) c

75%

3

abc

b

4

c ) a

100%

4

T 2

abc

c

4

c ) b

75%

3

T 3

abc

ab

4

b ) ac

75%

3

T 4

ab

ac

4

c ) ab

75%

3

T 5

a

bc

3

ab ) c

75%

3

T 6

ac

abc

3

ac ) b

75%

3

bc ) a

100%

3

Fig. 2.1: Database D along with its itemsets and related association rules.

Border Theory

The theory of the borders of the frequent itemsets is also very important in our

discussion as both border-based and exact hiding methodologies rely on a pro-

cess inspired from this theory. Let F D be the set of all frequent itemsets in D, and

P =Ã(I) be the set of all patterns in the lattice of D (e.g., see Fig. 2.2). The pos-

itive border of F D , denoted as Bd + (F D ), consists of all the maximally frequent

patterns in P, i.e. all the patterns in F D , whose all proper supersets are infrequent.

Formally, Bd + (F D ) = fI 2F D j for all J 2P with I J we have that J 2F D g.

Respectively, the negative border of F D , denoted as Bd (F D ), consists of all the

minimally infrequent patterns in P, i.e. all the patterns in PnF D , whose all proper

subsets are frequent. Formally, Bd (F D ) =fI 2PnF D j for all J I we have that

2 We will use sup(I) and freq(I) instead of sup(I; D) and freq(I; D), respectively, for notational

convenience, when database D is obvious in our context.

Next Page

Association Rule Hiding for Data Mining

Search WWH ::

Custom Search

Home