Database Reference
In-Depth Information
and has been leveraged by [ 44 ] to use one type of mining technique to address
different tasks: classification, subgroup discovery, and conceptual clustering.
Eliminating Minimum Support The above settings essentially apply statistical
measures in addition to minimum support. The minimum support parameter remains
a parameter that needs to be set. Several approaches have successfully eliminated
this parameter.
The main observation is that thresholds on quality measures can be translated
into support thresholds; hence, if a support threshold is not given, it is possible to
automatically determine an additional support threshold for use in a pattern mining
algorithm.
Returning to the accuracy measure, we can set a minimum threshold on it:
acc ( r )
θ acc . This can be transformed into p
+
( N
n )
θ acc ·| D |
, and fur-
ther into p
N . So we derive support constraints
based on the threshold on the quality measure itself [ 31 ].
For measures that are convex, which includes the ones mentioned above but also
many others, a similar argument is possible: convex functions take their maxima at
extreme points, i.e. points with p
θ acc ·| D |−
N
+
n
θ acc ·| D |−
0. Thus, based on a threshold on the
minimal acceptable values for a statistical scoring function, thresholds on a pattern's
p and n can be derived and enforced during mining. This makes it effective to use
the quality measure to prune during rule mining [ 6 , 7 , 11 , 12 , 15 , 19 , 29 , 31 , 35 , 39 ,
43 - 45 ].
Thus far we have discussed approaches that use thresholds and exhaustively mine
all patterns that satisfy the thresholds. An even easier and often more effective ap-
proach is to perform top- k mining instead. In top- k mining, one is interested in
finding only those patterns which have the k highest scores; the only parameter that
needs to be specified is k . This has been leveraged by [ 7 , 11 , 12 , 15 , 35 , 42 , 45 ]. The
nature of this mining process means that the threshold(s) increase during mining,
pruning more and more candidate patterns as the search progresses. To achieve a
quick increase of the threshold, it can be useful to perform a best-first search during
which it is always the rule with the highest upper bound that is specialized.
=
0or n
=
2.3
Numerical Target Values
As opposed to the setting discussed in the preceding sections, in which each transac-
tion is labeled with one out of a set of discrete labels, a numerical variable of interest
can have potentially infinitely many values. As a result of this, each transaction in the
data may have a different target value, and learning a predictor for particular values,
or partitioning the data into subsets consisting of transactions with the same target
value, are strategies that are unlikely to be successful. Nevertheless, there exist a
number of techniques for discretizing numerical values, in which case the problem
can be reduced to the classification setting.
Search WWH ::




Custom Search