Uncertain Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

ProbabilisƟc database D 2 of uncertain data:

{ t 1 ={ a :0.2 , b :0.9 , c :0.4},

t 2 ={ a :0.6 , b :0.6 , c :0.6 , d :0.9},

t 3 ={ a :0.6 , b :0.5 , d :0.5 , e :0.7},

t 4 ={ a :0.9 , b :0.2 , c :0.8 , e :0.3} }

D 2 =

Samples of instanƟated possible worlds for D 2 containing uncertain data:

Sample 1

Sample 2

Sample 3

Item

{ t 1 ,

t }

{ t 2 ,

t }

{}

{ t 3 }

{ t 1 ,

t }

{ t 1 ,

{ t 2 ,

t }

{ t 3 }

{ t 2 ,

t }

{ t 1 ,

{ t 2 ,

t }

{ t 2 }

{ t 4 }

2 ,

t 3 ,

t 4 }

2 }

4 }

2 ,

t 4 }

2 }

2 ,

t 4 }

3 }

4 }

2 ,

t 4 }

4 }

tID

list

ObservaƟons:

• expSup ({ e }, D 2 ) = 1.0 for D 2 vs . avg( sup ({ e },Sample S)) = 1.0 over the 3 samples

• expSup ({ a,c }, D 2 ) = 1.16 for D 2 vs . avg( sup ({ a,c },Sample S)) ≈ 2.33 over the 3 samples

• expSup ({ b,d }, D 2 ) = 0.79 for D 2 vs . avg( sup ({ b,d },Sample S)) ≈ 0.67 over the 3 samples

where S = 1, 2, 3

Fig. 14.12 Samples of instantiated “possible worlds” for D 2

Table 14.8 Augmented

tIDsets for domain items in

the probabilistic dataset D 2

Item

Augmented tIDset

{ t 1 :0.2, t 2 :0.6, t 3 :0.6, t 4 :0.9}

{ t 1 :0.9, t 2 :0.6, t 3 :0.5, t 4 :0.2}

{ t 1 :0.4, t 2 :0.6, t 4 :0.8}

{ t 2 :0.9, t 3 :0.5}

{ t 3 :0.7, t 4 :0.3}

other words, the resulting augmented tIDset for any item x is of the form

{ t i : P ( x , t i )

}

which is equivalent to

. See Table 14.8 .

With the use of augmented tIDsets to vertically represent the probabilistic

dataset D of uncertain data,

{

t i : expSup (

{

}

, t i )

}

the expected support of any 1-itemset

{

}

in D can

be computed by summing all P ( x , t i ) values in the augmented tIDset for

{

}

. The

tIDset of any ( k +1)-itemset X

≡

∪{

}

(where Y is a k -pattern and z is an item)

for k

. Each t i in the in-

tersection result is associated with an expected support value expSup ( X , t i ), which

is the product of expSup ( Y , t i ) and P ( z , t i ).

≥

1 can be formed by intersecting the tIDsets of Y and

{

}

9.3

U-VIPER: An Exact Algorithm

Vertical representations for a probabilistic dataset D of uncertain data are not confined

to set-based representations (e.g., augmented tIDsets used in UV-Eclat). There are

Frequent Pattern Mining

Search WWH ::

Custom Search

Home