Database Reference
In-Depth Information
Table 14.9 Vectors for domain items in the probabilistic dataset D 2
Vector for every domain item
0 . 2
0 . 6
0 . 6
0 . 9
0 . 9
0 . 6
0 . 5
0 . 2
0 . 4
0 . 6
0
0 . 8
0
0 . 9
0 . 5
0
0
0
0 . 7
0 . 3
b =
d =
a =
c =
e =
,
,
,
,
and
.
other representations. For instance, Leung et al. [ 44 ] proposed the U-VIPER algo-
rithm, in which D is vertically represented by a collection of fixed-size vectors—one
for each domain item x . The length of each vector is fixed and is equal to the number
of transactions (i.e.,
n ) in the probabilistic dataset of uncertain data. When
mining uncertain data, in addition to using a Boolean value (say, 0 or 1) to denote
whether or not transaction t i contains x in the vector, it is also important to capture
additional information: If x is likely to be present in a transaction t i , then its associ-
ated existential probability P ( x , t i )—which expresses the likelihood of x appearing
in transaction t i of the dataset—needs to be captured.
As it would be a waste of space to augment P ( x , t i ) to the Boolean value “1” for
t i (i.e., the i -th element of the vector for x ), U-VIPER replaces the Boolean value
“1” by P ( x , t i )asthe i -th element of each vector x representing domain item x .
Specifically, the i -th element of x (denoted as x [ i ]) stores (i) “0” if x is absent
from t i and (ii) P ( x , t i )if x is likely to be present in t i :
i -th element of x (i.e., x [ i ])
|
D
|=
=
expSup (
{
x
}
, t i )
0
if x
t i
=
(14.10)
P ( x , t i ) f x
t i
See Table 14.9 . With this vector-based representation, the expected support of any
1-itemset
can be computed by summing all non-zero P ( x , t i ) values in x (i.e.,
taking the L 1 -norm of x ):
{
x
}
n
expSup (
{
x
}
, D )
=
expSup (
{
x
}
, t i )
i
=
1
n
=
P ( x , t i )
i
=
1
n
1 x [ i ]
=|| x
=
i
=
|| 1
(14.11)
The i -th element of the vector of any ( k +1)-itemset X
Y
∪{
z
}
(where Y is
a k -itemset and z is an item) for k
1 can be formed by taking the product of
expSup ( Y , t i ) and P ( z , t i ). The expected support of X is then the dot product of Y
and {
z
}
.
 
Search WWH ::




Custom Search