Database Reference
In-Depth Information
Table 14.1
An example of a
traditional database
D
1
of
precise data
Transaction ID
Set of items
t
1
{
a
,
b
,
c
}
t
2
{
a
,
b
,
c
,
d
}
t
3
{
a
,
b
,
d
,
e
}
t
4
{
a
,
b
,
c
,
e
}
Table 14.2
An example of a
probabilistic dataset
D
2
of
uncertain data
Transaction ID
Set of items with existential probability
t
1
{
a
:0.2,
b
:0.9,
c
:0.4}
t
2
{
a
:0.6,
b
:0.6,
c
:0.6,
d
:0.9}
t
3
{
a
:0.6,
b
:0.5,
d
:0.5,
e
:0.7}
t
4
{
a
:0.9,
b
:0.2,
c
:0.8,
e
:0.3}
pattern mining algorithms searched traditional databases of precise data (e.g., Ta-
ble
14.1
) such as shopper market basket data, in which the contents of databases are
known. However, we are living in an uncertain world. Uncertain data can be found in
various real-life applications, in which users may not be certain about the presence or
absence of an item
x
in a transaction
t
i
in a probabilistic dataset
D
of uncertain data
(e.g., Table
14.2
). Users may suspect, but cannot guarantee, that
x
is present in
t
i
.
The uncertainty of such suspicion can be expressed in terms of
existential probability
P
(
x
,
t
i
), which indicates the likelihood of
x
being present in
t
i
in
D
. The existential
probability
P
(
x
,
t
i
) ranges from a positive value close to 0 (indicating that
x
has
an insignificantly low chance to be present in
D
) to a value of 1 (indicating that
x
is definitely present). With this notion, each item in any transaction in traditional
databases of precise data (e.g., shopper market basket data) can be viewed as an item
with a 100 % likelihood of being present in such a transaction.
The
probabilistic model
[
1
,
23
] is a key model that commonly used for uncertain
frequent pattern mining. When using the “possible world” interpretation of uncertain
data, there are two possible worlds for an item
x
in a transaction
t
i
:
(i) a possible world
W
1
where
x
is present in
t
i
(i.e.,
x
∈
t
i
)
and
(ii) another possible world
W
2
where
x
is absent from
t
i
(i.e.,
x
t
i
).
Although it is uncertain which of these two worlds is the true world, the probability
of
W
1
to be the true world is
P
(
x
,
t
i
) and the probability of
W
2
to be the true world is
1
∈
P
(
x
,
t
i
). To a further extent, there are multiple items in each of many transactions
in a probabilistic dataset
D
of uncertain data. In a domain of
m
distinct items, when
there are a total of
q
independent items (which include multiple occurrences of
some of the
m
domain items, where
m
−
q
) in all transactions of
D
, there are
O(2
q
) possible worlds. The
expected support
of a pattern
X
in
D
—denoted as
expSup
(
X
,
D
)—can then be computed by summing the support
sup
(
X
,
W
j
)of
X
in possible world
W
j
(while taking into account the probability
Prob
(
W
j
)of
W
j
to