Interesting Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

discarded. Consequently, we are interested in patterns that the model considers

very unlikely.

2. Iterative pattern ranking . While static pattern ranking addresses the problem of

redundancy with respect to background knowledge, it does not explicitly address

the problem of redundancy between patterns. We can approach this problem more

directly with dynamic ranking: At the beginning we start with a simple model and

find the most surprising pattern(s). Once this pattern is identified, we consider it

'known' and insert the pattern into our model, which updates our expectations—

and repeat the process. As a result we get a sequence of patterns that are surprising

and non-redundant with regard to the background knowledge and higher ranked

patterns.

3. Pattern set mining . The methods in the above categories measure interesting-

ness only per individual pattern. The third and last category we consider aims at

identifying the best set of patterns, and hence propose an interestingness measure

over pattern sets . As such, these measures directly punish redundancy—a pattern

is only as good as its contribution to the set.

4

Static Background Models

In Sect. 2 we discussed absolute interestingness measures, which we can now say

are essentially only based on counting. In this section we will cover slightly more

advances measures. In particular, we will discuss measures that instead of rely-

ing just on absolute measurements, contrast these measurements with the expected

measurement for that pattern. The basic intuition here is that the more strongly the

observation deviates from the expectation, the more interesting the pattern is.

Clearly, there are many different ways to express such expectation. Most often

these are calculated using on a probabilistic model of the data. Which model is

appropriate depends on the background knowledge we have and/or the assumptions

we are willing to make about the data. As such, in this section we will cover a wide

range of different models that have been proposed to formalize such expectations.

However, in order to be able to identify whether a pattern is interesting, we need

to be able whether the deviation between the observation and the expectation is large

enough. That is, whether the deviation, and hence correspondingly the pattern, is

significant or not. To this end we will discuss a variety of (statistical) tests that have

been proposed to identify interesting patterns.

For clarity, we will start our discussion with the most simple model, the inde-

pendence model. We will then use this model as an example to discuss a range of

significance measures. We will then proceed to discuss more complex models, that

can incorporate more background knowledge, for which many of these tests are

also applicable. Interleaved we will also discuss interestingness measures specific to

particular models and setups.

Before we start, there is one important observation to make. As opposed to the

previous section, the measures we will discuss here are typically not used to mine

Search WWH ::

Custom Search

Home