Database Reference
In-Depth Information
Table 14.9
Vectors for domain items in the probabilistic dataset
D
2
Vector for every domain item
⎛
⎝
⎞
⎠
⎛
⎝
⎞
⎠
⎛
⎝
⎞
⎠
⎛
⎝
⎞
⎠
⎛
⎝
⎞
⎠
0
.
2
0
.
6
0
.
6
0
.
9
0
.
9
0
.
6
0
.
5
0
.
2
0
.
4
0
.
6
0
0
.
8
0
0
.
9
0
.
5
0
0
0
0
.
7
0
.
3
−
b
=
−
d
=
−
a
=
−
c
=
−
e
=
,
,
,
,
and
.
other representations. For instance, Leung et al. [
44
] proposed the
U-VIPER
algo-
rithm, in which
D
is vertically represented by a collection of fixed-size vectors—one
for each domain item
x
. The length of each vector is fixed and is equal to the number
of transactions (i.e.,
n
) in the probabilistic dataset of uncertain data. When
mining uncertain data, in addition to using a Boolean value (say, 0 or 1) to denote
whether or not transaction
t
i
contains
x
in the vector, it is also important to capture
additional information: If
x
is likely to be present in a transaction
t
i
, then its associ-
ated existential probability
P
(
x
,
t
i
)—which expresses the likelihood of
x
appearing
in transaction
t
i
of the dataset—needs to be captured.
As it would be a waste of space to augment
P
(
x
,
t
i
) to the Boolean value “1” for
t
i
(i.e., the
i
-th element of the vector for
x
), U-VIPER replaces the Boolean value
“1” by
P
(
x
,
t
i
)asthe
i
-th element of each vector
−
x
representing domain item
x
.
Specifically, the
i
-th element of
−
x
(denoted as
−
x
[
i
]) stores (i) “0” if
x
is absent
from
t
i
and (ii)
P
(
x
,
t
i
)if
x
is likely to be present in
t
i
:
i
-th element of
−
x
(i.e.,
−
x
[
i
])
|
D
|=
=
expSup
(
{
x
}
,
t
i
)
0
if
x
∈
t
i
=
(14.10)
P
(
x
,
t
i
) f
x
∈
t
i
See Table
14.9
. With this vector-based representation, the expected support of any
1-itemset
can be computed by summing all non-zero
P
(
x
,
t
i
) values in
−
x
(i.e.,
taking the
L
1
-norm
of
−
x
):
{
x
}
n
expSup
(
{
x
}
,
D
)
=
expSup
(
{
x
}
,
t
i
)
i
=
1
n
=
P
(
x
,
t
i
)
i
=
1
n
1
−
x
[
i
]
=||
−
x
=
i
=
||
1
(14.11)
The
i
-th element of the vector of any (
k
+1)-itemset
X
≡
Y
∪{
z
}
(where
Y
is
a
k
-itemset and
z
is an item) for
k
1 can be formed by taking the product of
expSup
(
Y
,
t
i
) and
P
(
z
,
t
i
). The expected support of
X
is then the
dot product
of
−
Y
and
−
{
≥
z
}
.