Interesting Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

for unsupervised, or exploratory, pattern mining in binary data—by far the most

well studied part of pattern mining. We do note up front, however, that many of the

discussed measures and algorithms are highly general and applicable to other settings.

We will discuss the topic in three main parts, loosely following the development of

the field over time. That is, in Sect. 2 we discuss relatively simple, absolute measures

of interest—of which frequency is a well-known example. As we will see, applying

these measures leads to problems in terms of redundancy, difficult parameterization,

as well as returning trivial results. In Sect. 3 we discuss, on a relatively high level, the

advanced approaches proposed aim to solve these problems. We discuss two of the

main proponents in Sects. 4 and 5 . In the former we go into detail on approaches that

use statistical tests to select or rank patterns based on how significant they are with

regard to background knowledge. In the latter we cover the relatively new approach

of iterative pattern mining, or, dynamic ranking, where we iteratively update our

background knowledge with the most informative patterns so far.

We note that despite our best efforts, we did not find an ideal taxonomy over all

methods, as some methods exhibit aspects of more than one of these categories. In

such instances we choose to discuss them in the category they fit most naturally, yet

will identify alternate ways of looking at these papers. We identify open research

challenges and round up with conclusions in Sect. 7 .

2

Absolute Measures

In this section we discuss relatively straightforward measures of interestingness. In

particular, we focus on what we call absolute measures. That is, measures that score

patterns using only the data at hand, without contrasting their calculations over the

data to any expectation using statistical tests.

More formally, in this section we consider a specific—and perhaps the most well-

known—class of pattern mining problems, viz., theory mining [ 45 ]. In this setting, a

pattern is defined as a description of an interesting subset of the database. Formally,

this task has been described by Mannila and Toivonen [ 43 ] as follows.

Given a database D , a language

defining subsets of the data, and a selection

predicate q that determines whether an element φ

L

∈ L

describes an interesting subset

of D or not, the task is to find

T (

L

, D , q )

={

φ

∈ L |

q ( D , φ ) is true

}

That is, the task is to find all interesting subsets.

2.1

Frequent Itemsets

The best known instance of theory mining is frequent set mining [ 3 ]. The standard

example for this is the analysis of shopping baskets in a supermarket. Let I be the

set of items the store sells. The database D consists of a set of transactions in which

Frequent Pattern Mining

Search WWH ::

Custom Search

Home