Database Reference
In-Depth Information
discarded. Consequently, we are interested in patterns that the model considers
very unlikely.
2. Iterative pattern ranking . While static pattern ranking addresses the problem of
redundancy with respect to background knowledge, it does not explicitly address
the problem of redundancy between patterns. We can approach this problem more
directly with dynamic ranking: At the beginning we start with a simple model and
find the most surprising pattern(s). Once this pattern is identified, we consider it
'known' and insert the pattern into our model, which updates our expectations—
and repeat the process. As a result we get a sequence of patterns that are surprising
and non-redundant with regard to the background knowledge and higher ranked
patterns.
3. Pattern set mining . The methods in the above categories measure interesting-
ness only per individual pattern. The third and last category we consider aims at
identifying the best set of patterns, and hence propose an interestingness measure
over pattern sets . As such, these measures directly punish redundancy—a pattern
is only as good as its contribution to the set.
4
Static Background Models
In Sect. 2 we discussed absolute interestingness measures, which we can now say
are essentially only based on counting. In this section we will cover slightly more
advances measures. In particular, we will discuss measures that instead of rely-
ing just on absolute measurements, contrast these measurements with the expected
measurement for that pattern. The basic intuition here is that the more strongly the
observation deviates from the expectation, the more interesting the pattern is.
Clearly, there are many different ways to express such expectation. Most often
these are calculated using on a probabilistic model of the data. Which model is
appropriate depends on the background knowledge we have and/or the assumptions
we are willing to make about the data. As such, in this section we will cover a wide
range of different models that have been proposed to formalize such expectations.
However, in order to be able to identify whether a pattern is interesting, we need
to be able whether the deviation between the observation and the expectation is large
enough. That is, whether the deviation, and hence correspondingly the pattern, is
significant or not. To this end we will discuss a variety of (statistical) tests that have
been proposed to identify interesting patterns.
For clarity, we will start our discussion with the most simple model, the inde-
pendence model. We will then use this model as an example to discuss a range of
significance measures. We will then proceed to discuss more complex models, that
can incorporate more background knowledge, for which many of these tests are
also applicable. Interleaved we will also discuss interestingness measures specific to
particular models and setups.
Before we start, there is one important observation to make. As opposed to the
previous section, the measures we will discuss here are typically not used to mine
Search WWH ::




Custom Search