Database Reference
In-Depth Information
An arrangement in a contingency table invites the use of well-established measures
such as Information Gain or χ 2 to mine correlating [ 29 ], contrast [ 6 ], or discriminat-
ing patterns [ 11 ]. Similarly, the growth rate can be used to mine emerging patterns
[ 14 , 26 , 37 ]. It divides the support in one class by the support in the other one.
A measure that is often used in subgroup discovery is Weighted Relative Accuracy
[ 23 ]:
Definition 2.6
Given a rule r
C + , its Weighted Relative Accuracy is defined as
.
It is instructive to compare accuracy and WRAcc to gain a better understanding
of the conceptual differences between classification and subgroup discovery.
Since the final goal is to find rules with good predictive accuracy, accuracy treats
covering one negative instance less as equal to covering one positive instance more.
Consider a data set consisting of 60 instances in
supp D + ( r )
supp D ( r )
supp D
( r )
| D + |
| D |
WRAcc ( r
C + )
=
| D |
D , and a rule covering
40 positive and 15 negative instances. Its accuracy is 0 . 65, and rules that covered 5
positive instances more, or 5 negative instances less, would both achieve a (better)
accuracy of 0 . 7. In the case of WRAcc, the situation is different: the original rule
would have a score of 0 . 07 and while covering 5 negative instances less improves it
to 1 . 0, covering 5 positive instances more yields a smaller improvement (to 0 . 09).
Since subgroup discovery aims to characterize differences , this behavior makes
perfect sense: the positive class is overrepresented in the entire data and coverage of
this class has to increase more strongly to be interesting. Given a heavily skewed data
set (e.g.
D + , and 40 in
| D + |=
), a rule predicting all transactions to belong to the majority
class might be acceptable for a classifier but would be unattractive for subgroup
discovery.
WRAcc also includes a normalizing factor that weights a rule's score by its effect
size but this is in fact not particular to subgroup discovery. When it comes to nor-
malization, the difference between classification and subgroup discovery measures
lies in the motivation: classification wants assurance that mined rules will work on
unseen data, subgroup discovery wants rules to be representative of the data they
have been mined from.
In combination with a minimum support constraint, WRAcc can be used in a class
association rule miner instead of confidence [ 21 ]. This idea can be generalized to
other subgroup discovery measures (and the measures listed above), replacing the
confidence measure in class association rule miners by numerous other functions as
proposed by [ 5 ]. CMAR, for instance, filters car s using a χ 2 minimum threshold in
addition to the minimum confidence threshold.
That the differences between different types of supervised patterns mainly come
down to a change in quality function has been shown in detail by [ 32 ], the authors of
which coined the term “supervised descriptive rule discovery” for such approaches
0 . 9
| D |
Search WWH ::




Custom Search