Database Reference
In-Depth Information
for unsupervised, or exploratory, pattern mining in binary data—by far the most
well studied part of pattern mining. We do note up front, however, that many of the
discussed measures and algorithms are highly general and applicable to other settings.
We will discuss the topic in three main parts, loosely following the development of
the field over time. That is, in Sect. 2 we discuss relatively simple, absolute measures
of interest—of which frequency is a well-known example. As we will see, applying
these measures leads to problems in terms of redundancy, difficult parameterization,
as well as returning trivial results. In Sect. 3 we discuss, on a relatively high level, the
advanced approaches proposed aim to solve these problems. We discuss two of the
main proponents in Sects. 4 and 5 . In the former we go into detail on approaches that
use statistical tests to select or rank patterns based on how significant they are with
regard to background knowledge. In the latter we cover the relatively new approach
of iterative pattern mining, or, dynamic ranking, where we iteratively update our
background knowledge with the most informative patterns so far.
We note that despite our best efforts, we did not find an ideal taxonomy over all
methods, as some methods exhibit aspects of more than one of these categories. In
such instances we choose to discuss them in the category they fit most naturally, yet
will identify alternate ways of looking at these papers. We identify open research
challenges and round up with conclusions in Sect. 7 .
2
Absolute Measures
In this section we discuss relatively straightforward measures of interestingness. In
particular, we focus on what we call absolute measures. That is, measures that score
patterns using only the data at hand, without contrasting their calculations over the
data to any expectation using statistical tests.
More formally, in this section we consider a specific—and perhaps the most well-
known—class of pattern mining problems, viz., theory mining [ 45 ]. In this setting, a
pattern is defined as a description of an interesting subset of the database. Formally,
this task has been described by Mannila and Toivonen [ 43 ] as follows.
Given a database D , a language
defining subsets of the data, and a selection
predicate q that determines whether an element φ
L
L
describes an interesting subset
of D or not, the task is to find
T (
L
, D , q )
={
φ
L |
q ( D , φ ) is true
}
That is, the task is to find all interesting subsets.
2.1
Frequent Itemsets
The best known instance of theory mining is frequent set mining [ 3 ]. The standard
example for this is the analysis of shopping baskets in a supermarket. Let I be the
set of items the store sells. The database D consists of a set of transactions in which
Search WWH ::




Custom Search