Database Reference
In-Depth Information
items is important as well. As a basic frequent itemset mining system does not support
utilities, we cannot use it directly; we either have to:
￿
understand the code of the frequent itemset mining algorithm to add an additional
constraint to it;
￿
or, write a second algorithm for processing the results of the frequent itemset
mining system to evaluate the additional constraint for each of the itemsets found.
Both options are cumbersome. The second option is likely to be computationally
inefficient if the number of frequent itemsets is large. The first option can be efficient,
provided that the programmer has a deep understanding of the code that is being
modified.
These disadvantages have led researchers to develop more general systems that
provide easy-to-use interfaces for specifying the constraints that the pattern mining
systems need to use during the search. The development of these systems has involved
several challenges:
￿
the identification of general classes of constraints, all of which can be processed
in a generic and similar way;
￿
the development of languages in which constraints can be expressed, such that
all expressions in the language correspond to constraints in a class of constraints
supported by a system;
￿
the development of search algorithms that can deal with constraints in a certain
class.
This chapter will provide an overview of the state-of-the-art for each of these chal-
lenges. We will first formalize the problem of constraint-based pattern mining,
including a discussion of different classes of constraints. Subsequently, we will
discuss the most common search algorithms for these classes of constraints. Finally,
we will discuss the languages that allow for the expression of constraints in pattern
mining.
2
Problem Definition
Constraint-based mining starts from the observation that many pattern mining
problems can be seen as instances of the following generic problem statement:
Given
￿
a data language L D
2 L D with transactions
￿ a pattern language L π
￿ a constraint ϕ : L π ×
￿
a database D
2 L D
→{
0, 1
}
Find all patterns π
L π for which ϕ ( π , D )
=
1.
The pattern language typically describes the syntax of the patterns we wish to
find in the data. Constraints typically describe the statistical, syntactical, or other
requirements that we wish these patterns to satisfy on the data.
Search WWH ::




Custom Search