Database Reference
In-Depth Information
Table 14.1 An example of a
traditional database D 1 of
precise data
Transaction ID
Set of items
t 1
{
a , b , c
}
t 2
{
a , b , c , d
}
t 3
{
a , b , d , e
}
t 4
{
a , b , c , e
}
Table 14.2 An example of a
probabilistic dataset D 2 of
uncertain data
Transaction ID
Set of items with existential probability
t 1
{ a :0.2, b :0.9, c :0.4}
t 2
{ a :0.6, b :0.6, c :0.6, d :0.9}
t 3
{ a :0.6, b :0.5, d :0.5, e :0.7}
t 4
{ a :0.9, b :0.2, c :0.8, e :0.3}
pattern mining algorithms searched traditional databases of precise data (e.g., Ta-
ble 14.1 ) such as shopper market basket data, in which the contents of databases are
known. However, we are living in an uncertain world. Uncertain data can be found in
various real-life applications, in which users may not be certain about the presence or
absence of an item x in a transaction t i in a probabilistic dataset D of uncertain data
(e.g., Table 14.2 ). Users may suspect, but cannot guarantee, that x is present in t i .
The uncertainty of such suspicion can be expressed in terms of existential probability
P ( x , t i ), which indicates the likelihood of x being present in t i in D . The existential
probability P ( x , t i ) ranges from a positive value close to 0 (indicating that x has
an insignificantly low chance to be present in D ) to a value of 1 (indicating that x
is definitely present). With this notion, each item in any transaction in traditional
databases of precise data (e.g., shopper market basket data) can be viewed as an item
with a 100 % likelihood of being present in such a transaction.
The probabilistic model [ 1 , 23 ] is a key model that commonly used for uncertain
frequent pattern mining. When using the “possible world” interpretation of uncertain
data, there are two possible worlds for an item x in a transaction t i :
(i) a possible world W 1 where x is present in t i (i.e., x t i )
and
(ii) another possible world W 2 where x is absent from t i (i.e., x
t i ).
Although it is uncertain which of these two worlds is the true world, the probability
of W 1 to be the true world is P ( x , t i ) and the probability of W 2 to be the true world is
1
P ( x , t i ). To a further extent, there are multiple items in each of many transactions
in a probabilistic dataset D of uncertain data. In a domain of m distinct items, when
there are a total of q independent items (which include multiple occurrences of
some of the m domain items, where m
q ) in all transactions of D , there are
O(2 q ) possible worlds. The expected support of a pattern X in D —denoted as
expSup ( X , D )—can then be computed by summing the support sup ( X , W j )of X
in possible world W j (while taking into account the probability Prob ( W j )of W j to
 
Search WWH ::




Custom Search