Database Reference
In-Depth Information
determines the importance of a pattern in this regard is called an interestingness
measure. 1
The traditional pattern mining question is to ask for all patterns in the data that
satisfy some interestingness constraint. For example, all patterns that occur at least
n times, or, those that are so-and-so significant according to a certain statistical test.
Intuitively this makes sense, yet in practice, this approach rarely leads to satisfactory
results.
The primary cause is the pattern explosion. While strict constraints only result
in few patterns, these are seldom informative: they are the most obvious patterns,
and hence often long-since common knowledge. However, when we loosen the
constraints—to discover novel associations—the pattern explosion occurs and we
are flooded with results. More often than not orders of magnitude more patterns
are returned than there are rows in the data. In fact, even for modest amounts of
data billions of patterns are discovered for non-trivial constraints. Clearly, in such
numbers these patterns are impossible to consider by hand, as well as very difficult
to use in any other task—therewith effectively negating the goal of mining these
patterns in the first place. Not quite the ideal result.
It does, however, provide us a second observation on the ideal outcome: we do
not want to have too many results. In particular, we want to avoid redundancy : every
pattern should be interesting or useful with regard to all of the other patterns in
the result.
Simply put, traditional pattern mining has been asking the wrong question. In
most situations, what we really want is a small, non-redundant, and as interesting
possible group of patterns. As such, instead of asking for all patterns that satisfy
some constraint, we should ask for the set of patterns that is optimal with regard to a
global interestingness criterion. This means evaluating groups of patterns indirectly,
i.e. by first constructing a model using these patterns, and then scoring the quality of
that model. The main questions are then how to construct such a model, and which
criterion should be used? Clearly, this depends on the task at hand.
In this chapter, we focus on exploratory data analysis. That is, our goal is to
explore the data for new insights, to discover any local structure in the data—in
other words, to discover patterns that describe the most important associations of
the data, patterns that capture the distribution of the data. As such, we are looking
for a set of patterns that models the data well. To this end, we need a criterion
that measures both how well the patterns capture the distribution of the data, and—
to avoid overfitting and redundancy—how complex the set of patterns is. Given a
global interestingness criterion we can perform model selection and identify the
best model. There are a few such criteria available, including Akaike's Information
Criterion (AIC) [ 2 ] and the Bayesian Information Criterion (BIC) [ 53 ]. For pattern
mining, the Minimum Description Length (MDL) principle [ 52 ] is the most natural
choice. It provides a principled, statistically well-founded, yet practical approach for
defining an objective function for descriptive models—which, as patterns describe
part of the data, fits our setting rather well.
1
See Chap. 5 for a detailed overview of interestingness measures.
Search WWH ::




Custom Search