Database Reference
In-Depth Information
9
Vertical Uncertain Frequent Pattern Mining
The aforementioned candidate generate-and-test based, hyperlinked structure based,
as well as tree-based mining algorithms use horizontal mining , for which a dataset
can be viewed as a collection of transactions. Each transaction is a set of items.
Alternatively, vertical mining can be applied, for which each dataset can be viewed
as a collection of items and their associated lists (or sets or vectors) of transaction IDs
(i.e., tIDs). Each list of tID of an item x represents all the transactions containing
x . With this vertical representation of datasets, the support of a pattern X can be
computed by intersecting the lists of tIDs of items within X .
9.1
U-Eclat: An Approximate Algorithm
To mine frequent patterns using the vertical representation of probabilistic datasets
of uncertain data, Calders et al. [ 17 ] instantiated “possible worlds” of the datasets
to get instantiated samples (in which data become precise) and then applied the
Eclat algorithm [ 55 ] to each of these samples of instantiated databases. The resulting
algorithm is called U-Eclat . Given a probabilistic dataset D of uncertain data, U-
Eclat generates an independent random number r for each item x in a transaction
t i . If the existential probability P ( x , t i ) of item x in transaction t i is no less than
such a random number r (i.e., P ( x , t i )
r ), then x is instantiated and included in a
“precise” sampled database, which is then mined using the original Eclat algorithm.
This sampling and instantiation process is repeated multiple times, and thus generates
multiple sampled “precise” databases. The estimated support of any pattern X is the
average support of X over the multiple sampled databases. Figure 14.12 shows three
sampled databases for probabilistic dataset D 2 . As a sampling-based algorithm, U-
Eclat gains efficiency but loses accuracy. More instantiations (i.e., more samples)
helps improve accuracy, but it comes at the cost of an increase in execution time.
9.2
UV-Eclat: An Exact Algorithm
Alternatively, to avoid instantiations and to directly mine frequent patterns using the
vertical representation of probabilistic datasets containing uncertain data, Budhia
et al. [ 16 ] proposed the UV-Eclat algorithm. When mining uncertain data, in addition
to recording which transactions contain x in the set of tIDs (i.e., tIDsets ), it is
also important to capture additional information: If x is likely to be present in a
transaction t i , then its associated existential probability P ( x , t i )—which expresses
the likelihood of x appearing in transaction t i of the dataset—needs to be captured.
As it would be impractical to build a tIDset for each distinct
domain item, ex-
istential probability value
pair due to the huge number of such pairs, UV-Eclat
builds a tIDset for each domain item x instead. In each set, UV-Eclat augments every
tID (representing t i ) with its corresponding existential probability value P ( x , t i ). In
Search WWH ::




Custom Search