Database Reference
In-Depth Information
ProbabilisƟc database D 2 of uncertain data:
{ t 1 ={ a :0.2 , b :0.9 , c :0.4},
t 2 ={ a :0.6 , b :0.6 , c :0.6 , d :0.9},
t 3 ={ a :0.6 , b :0.5 , d :0.5 , e :0.7},
t 4 ={ a :0.9 , b :0.2 , c :0.8 , e :0.3} }
D 2 =
Samples of instanƟated possible worlds for D 2 containing uncertain data:
Sample 1
Sample 2
Sample 3
Item
a
b
c
d
e
a
b
c
d
e
a
b
c
d
e
{ t 1 ,
t
{ t 1 ,
t }
{ t 2 ,
t }
{}
{ t 3 }
{ t 1 ,
t
{ t 1 ,
t }
{ t 1 ,
t
{ t 2 ,
t }
{ t 3 }
{ t 2 ,
t }
{ t 1 ,
t
{ t 2 ,
t }
{ t 2 }
{ t 4 }
2 ,
t 3 ,
t 4 }
2 }
4 }
2 ,
t 4 }
2 }
2 ,
t 4 }
3 }
4 }
2 ,
t 4 }
4 }
tID
list
ObservaƟons:
expSup ({ e }, D 2 ) = 1.0 for D 2 vs . avg( sup ({ e },Sample S)) = 1.0 over the 3 samples
expSup ({ a,c }, D 2 ) = 1.16 for D 2 vs . avg( sup ({ a,c },Sample S)) 2.33 over the 3 samples
expSup ({ b,d }, D 2 ) = 0.79 for D 2 vs . avg( sup ({ b,d },Sample S)) 0.67 over the 3 samples
where S = 1, 2, 3
Fig. 14.12 Samples of instantiated “possible worlds” for D 2
Table 14.8 Augmented
tIDsets for domain items in
the probabilistic dataset D 2
Item
Augmented tIDset
a
{ t 1 :0.2, t 2 :0.6, t 3 :0.6, t 4 :0.9}
b
{ t 1 :0.9, t 2 :0.6, t 3 :0.5, t 4 :0.2}
c
{ t 1 :0.4, t 2 :0.6, t 4 :0.8}
d
{ t 2 :0.9, t 3 :0.5}
e
{ t 3 :0.7, t 4 :0.3}
other words, the resulting augmented tIDset for any item x is of the form
{ t i : P ( x , t i )
}
,
which is equivalent to
. See Table 14.8 .
With the use of augmented tIDsets to vertically represent the probabilistic
dataset D of uncertain data,
{
t i : expSup (
{
x
}
, t i )
}
the expected support of any 1-itemset
{
x
}
in D can
be computed by summing all P ( x , t i ) values in the augmented tIDset for
{
x
}
. The
tIDset of any ( k +1)-itemset X
Y
∪{
z
}
(where Y is a k -pattern and z is an item)
for k
. Each t i in the in-
tersection result is associated with an expected support value expSup ( X , t i ), which
is the product of expSup ( Y , t i ) and P ( z , t i ).
1 can be formed by intersecting the tIDsets of Y and
{
z
}
9.3
U-VIPER: An Exact Algorithm
Vertical representations for a probabilistic dataset D of uncertain data are not confined
to set-based representations (e.g., augmented tIDsets used in UV-Eclat). There are
Search WWH ::




Custom Search