Database Reference
In-Depth Information
Alternatively, one can either attempt to mine patterns that partition the data into
transactions that have approximately the same numerical value, or those that can be
used as elements of a regression function that outputs a numerical result based on
their appearance in a transaction. An interestingness measure that can be used in the
former case is interclass variance :
Definition 2.7
Given a data set with numerical labels of the form
D Y
={
( d 1 , y 1 ),
... ,( d n , y n )
}
, y i
R , pattern π , the average y in a subset
D of that data set is:
( d i , y i ) D y i
| D |
avg (
D )
=
The interclass variance of π is defined as:
D Y ) ) 2
var ( π )
=|
cov ( π )
| (avg ( cov ( π ))
avg (
D Y ) ) 2
+| D Y \
cov ( π )
|
(avg (
D Y \
cov ( π ))
avg (
Interclass variance is convex, which means that thresholds on its value can be trans-
lated into thresholds on support values, and thresholded or top- k mining used in the
same manner as for discrete target values.
In the latter case, works such as [ 13 , 33 , 34 ] have chosen linear regression func-
tions that weight the contributions of individual patterns. Based on these weights, the
authors define a quality function for individual patterns, and derive upper bounds that
they use to perform top- k mining for component patterns of the regression model.
3
Supervised Pattern Set Mining
The result of a supervised pattern mining operation, as so often in pattern mining set-
tings, is typically a very large set of redundant and contradictory patterns. Even when
mining only the top- k patterns, many of those will cover (almost) the same instances.
As we mentioned in the introduction, when constructing classifiers, redundant pat-
terns or patterns that are irrelevant in the presence of others can be undesirable. If
the classifier takes the form of an unordered rule set, for instance, which we will
describe in Sect. 4 , certain rules could strongly boost each other, far in excess of
their actual relevance and usefulness.
Hence many techniques in the literature include a mechanism for mining or
selecting a subset of the result set. Where the techniques for supervised pattern
mining intended to improve on Machine Learning techniques, replacing heuris-
tics with exhaustive search, the methods for supervised pattern set mining are
strongly inspired by Machine Learning techniques. In particular, both sequential
(covering/re-weighting) or separate-and-conquer , and decision tree like divide-and-
conquer techniques can be found time and again in works on supervised pattern
mining.
There are two wide-spread approaches to pattern set mining.
One is post-
processing :
Search WWH ::




Custom Search