Database Reference
In-Depth Information
ProbabilisƟc database
D
2
of uncertain data:
{
t
1
={
a
:0.2
, b
:0.9
, c
:0.4},
t
2
={
a
:0.6
, b
:0.6
, c
:0.6
, d
:0.9},
t
3
={
a
:0.6
, b
:0.5
, d
:0.5
, e
:0.7},
t
4
={
a
:0.9
, b
:0.2
, c
:0.8
, e
:0.3} }
D
2
=
Samples of instanƟated possible worlds for
D
2
containing uncertain data:
Sample 1
Sample 2
Sample 3
Item
a
b
c
d
e
a
b
c
d
e
a
b
c
d
e
{
t
1
,
t
{
t
1
,
t
}
{
t
2
,
t
}
{}
{
t
3
}
{
t
1
,
t
{
t
1
,
t
}
{
t
1
,
t
{
t
2
,
t
}
{
t
3
}
{
t
2
,
t
}
{
t
1
,
t
{
t
2
,
t
}
{
t
2
}
{
t
4
}
2
,
t
3
,
t
4
}
2
}
4
}
2
,
t
4
}
2
}
2
,
t
4
}
3
}
4
}
2
,
t
4
}
4
}
tID
list
ObservaƟons:
•
expSup
({
e
},
D
2
) = 1.0 for
D
2
vs
. avg(
sup
({
e
},Sample S)) = 1.0 over the 3 samples
•
expSup
({
a,c
},
D
2
) = 1.16 for
D
2
vs
. avg(
sup
({
a,c
},Sample S))
≈
2.33 over the 3 samples
•
expSup
({
b,d
},
D
2
) = 0.79 for
D
2
vs
. avg(
sup
({
b,d
},Sample S))
≈
0.67 over the 3 samples
where S = 1, 2, 3
Fig. 14.12
Samples of instantiated “possible worlds” for
D
2
Table 14.8
Augmented
tIDsets for domain items in
the probabilistic dataset
D
2
Item
Augmented tIDset
a
{
t
1
:0.2,
t
2
:0.6,
t
3
:0.6,
t
4
:0.9}
b
{
t
1
:0.9,
t
2
:0.6,
t
3
:0.5,
t
4
:0.2}
c
{
t
1
:0.4,
t
2
:0.6,
t
4
:0.8}
d
{
t
2
:0.9,
t
3
:0.5}
e
{
t
3
:0.7,
t
4
:0.3}
other words, the resulting augmented tIDset for any item
x
is of the form
{
t
i
:
P
(
x
,
t
i
)
}
,
which is equivalent to
. See Table
14.8
.
With the use of augmented tIDsets to vertically represent the probabilistic
dataset
D
of uncertain data,
{
t
i
:
expSup
(
{
x
}
,
t
i
)
}
the expected support of any 1-itemset
{
x
}
in
D
can
be computed by summing all
P
(
x
,
t
i
) values in the augmented tIDset for
{
x
}
. The
tIDset of any (
k
+1)-itemset
X
≡
Y
∪{
z
}
(where
Y
is a
k
-pattern and
z
is an item)
for
k
. Each
t
i
in the in-
tersection result is associated with an expected support value
expSup
(
X
,
t
i
), which
is the product of
expSup
(
Y
,
t
i
) and
P
(
z
,
t
i
).
≥
1 can be formed by intersecting the tIDsets of
Y
and
{
z
}
9.3
U-VIPER: An Exact Algorithm
Vertical representations for a probabilistic dataset
D
of uncertain data are not confined
to set-based representations (e.g., augmented tIDsets used in UV-Eclat). There are