Database Reference
In-Depth Information
is a popular choice. This means that in practice, pattern set mining methods and
dynamic modeling have a lot in common; although in iterative ranking there is no
explicit global goal defined, in order to find a ranking, we iteratively greedily find the
locally most informative pattern, and update the model. A key difference to pattern
set mining is that here we explicitly take the complexity of the pattern set in check;
we consider both the gain in quality, as well as the cost in complexity over the full
set.
(Though pattern set mining is a combinatorial problem, and intuitively optimizing
most instances seems very complex, so far theoretical hardness results have only been
found for a handful of cases, including [ 49 , 48 , 75 ].)
6.1
Itemsets
KRIMP is among the most well-known pattern set mining algorithms. Siebes et al. [ 62 ]
define the best set of itemsets by the Minimum Description Length principle as the
set that provides the best lossless compression. Each itemset is assigned a code word,
the length of which depends on how frequently the itemset is used when greedily
covering the data without overlap. The pattern set is then scored on the number of
bits necessary to lossless describe the data. That is, the sum over the number of bits
to encode the dictionary, the itemsets and their code words, and number of bits to
encode the data using these code words.
The KRIMP algorithm heuristically finds good sets by first mining frequent itemsets
up to a given minsup threshold, and greedily selecting from these in a fixed order.
The resulting pattern sets are typically small (100 s) and have been shown to be
useful in many data mining tasks [ 70 ] (see also Chap. 8). There exist a number of
variants of KRIMP for other data types, including for low-entropy sets [ 31 ]. Siebes and
Kersten [ 61 ] investigated to structure functions, and proposed the GROEI algorithm
for approximating the optimal k -pattern set.
Alternative to a descriptive model, we can aim to find a good generative model.
The MINI algorithm proposed by Gallo et al. [ 20 ] employs a probability distribution
based on itemsets and frequencies, and aims finding the set of itemsets that predict
the data best.
While intuitively appealing, using likelihood to score pattern sets, however, is not
enough since the score increases w.r.t. the inclusion of pattern set, that is, the set
containing all itemsets will have the highest likelihood. To control the size of the set
we need to exert some control over the output set. For example, we can either ask
the user to give a number of patterns k , or automatically control the number of the
itemsets by BIC [ 59 ] or MDL [ 58 ], in which case the improvement of adding a new
itemset into a result set must be significant.
The MaxEnt model employed by the MTV algorithm, which we met in the previous
section, does not only lend itself for iterative ranking, but can also be straight-
forwardly used with either of test two model selection criteria [ 42 ]. In practice,
Search WWH ::




Custom Search