Databases Reference
In-Depth Information
sold, package A is also sold (i.e., they appear in the same transaction). In this case, the
measure computes
2 D 0.505 0.02, which indi-
cates that A and B are positively correlated instead of negatively correlated. This also
matches our intuition.
.
P
.
B j A
/C P
.
A j B
//=
2 D.
0.01C1
/=
With this new definition of negative correlation, efficient methods can easily be
derived for mining negative patterns in large databases. This is left as an exercise for
interested readers.
7.3 Constraint-Based Frequent Pattern Mining
A data mining process may uncover thousands of rules from a given data set, most of
which end up being unrelated or uninteresting to users. Often, users have a good sense of
which “direction” of mining may lead to interesting patterns and the “form” of the pat-
terns or rules they want to find. They may also have a sense of “conditions” for the rules,
which would eliminate the discovery of certain rules that they know would not be of
interest. Thus, a good heuristic is to have the users specify such intuition or expectations
as constraints to confine the search space. This strategy is known as constraint-based
mining . The constraints can include the following:
Knowledge type constraints: These specify the type of knowledge to be mined, such
as association, correlation, classification, or clustering.
Data constraints: These specify the set of task-relevant data.
Dimension/level constraints: These specify the desired dimensions (or attributes)
of the data, the abstraction levels, or the level of the concept hierarchies to be used in
mining.
Interestingness constraints: These specify thresholds on statistical measures of rule
interestingness such as support, confidence, and correlation.
Rule constraints: These specify the form of, or conditions on, the rules to be mined.
Such constraints may be expressed as metarules (rule templates), as the maximum or
minimum number of predicates that can occur in the rule antecedent or consequent,
or as relationships among attributes, attribute values, and/or aggregates.
These constraints can be specified using a high-level declarative data mining query
language and user interface.
The first four constraint types have already been addressed in earlier sections of this
topic and this chapter. In this section, we discuss the use of rule constraints to focus the
mining task. This form of constraint-based mining allows users to describe the rules that
they would like to uncover, thereby making the data mining process more effective . In
addition, a sophisticated mining query optimizer can be used to exploit the constraints
specified by the user, thereby making the mining process more efficient .
 
Search WWH ::




Custom Search