Advanced Pattern Mining - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

sold, package A is also sold (i.e., they appear in the same transaction). In this case, the

measure computes

2 D 0.505 0.02, which indi-

cates that A and B are positively correlated instead of negatively correlated. This also

matches our intuition.

.

P

.

B j A

/C P

.

A j B

//=

2 D.

0.01C1

/=

With this new definition of negative correlation, efficient methods can easily be

derived for mining negative patterns in large databases. This is left as an exercise for

interested readers.

7.3 Constraint-Based Frequent Pattern Mining

A data mining process may uncover thousands of rules from a given data set, most of

which end up being unrelated or uninteresting to users. Often, users have a good sense of

which “direction” of mining may lead to interesting patterns and the “form” of the pat-

terns or rules they want to find. They may also have a sense of “conditions” for the rules,

which would eliminate the discovery of certain rules that they know would not be of

interest. Thus, a good heuristic is to have the users specify such intuition or expectations

as constraints to confine the search space. This strategy is known as constraint-based

mining . The constraints can include the following:

Knowledge type constraints: These specify the type of knowledge to be mined, such

as association, correlation, classification, or clustering.

Data constraints: These specify the set of task-relevant data.

Dimension/level constraints: These specify the desired dimensions (or attributes)

of the data, the abstraction levels, or the level of the concept hierarchies to be used in

mining.

Interestingness constraints: These specify thresholds on statistical measures of rule

interestingness such as support, confidence, and correlation.

Rule constraints: These specify the form of, or conditions on, the rules to be mined.

Such constraints may be expressed as metarules (rule templates), as the maximum or

minimum number of predicates that can occur in the rule antecedent or consequent,

or as relationships among attributes, attribute values, and/or aggregates.

These constraints can be specified using a high-level declarative data mining query

language and user interface.

The first four constraint types have already been addressed in earlier sections of this

topic and this chapter. In this section, we discuss the use of rule constraints to focus the

mining task. This form of constraint-based mining allows users to describe the rules that

they would like to uncover, thereby making the data mining process more effective . In

addition, a sophisticated mining query optimizer can be used to exploit the constraints

specified by the user, thereby making the mining process more efficient .

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home