Mining and Using Sets of Patterns through Compression - Frequent Pattern Mining

Database Reference

In-Depth Information

Chapter 8

Mining and Using Sets of Patterns through

Compression

Matthijs van Leeuwen and Jilles Vreeken

Abstract In this chapter we describe how to successfully apply the MDL principle

to pattern mining. In particular, we discuss how pattern-based models can be de-

signed and induced by means of compression, resulting in succinct and characteristic

descriptions of the data.

As motivation, we argue that traditional pattern mining asks the wrong question:

instead of asking for all patterns satisfying some interestingness measure, one should

ask for a small, non-redundant, and interesting set of patterns—which allows us

to avoid the pattern explosion. Firmly rooted in algorithmic information theory,

the approach we discuss in this chapter states that the best set of patterns is that

set that compresses the data best. We formalize this problem using the Minimum

Description Length (MDL) principle, describe useful model classes, and briefly

discuss algorithmic approaches to inducing good models from data. Last but not

least, we describe how the obtained models—in addition to showing the key patterns

of the data—can be used for a wide range of data mining tasks; hence showing that

MDL selects useful patterns.

Keywords Compression

·

MDL

·

Pattern set mining

·

Data summarization

1

Introduction

What is the ideal outcome of pattern mining? Which patterns would we really like

to find? Obviously, this depends on the task at hand, and possibly even on the user.

When we are exploring the data for new insights the ideal outcome will be different

than when the goal is to build a good pattern-based classifier.

There are, however, a few important general observations to be made. For starters,

we are not interested in patterns that describe noise—we only want patterns that

identify important associations in the data. In pattern mining, the function that usually

Search WWH ::

Custom Search

Home