Databases Reference
In-Depth Information
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0
InfoGain
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0
IG UpperBound
InfoGain
IG UpperBound
100
200
300
400
500
600
700
100
200
300
400
500
600
700
Support
Support
(a) Austral
(b) Breast
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
InfoGain
IG UpperBound
0
50
100
150
200
250
Support
(c) Sonar
Figure 9.12 Information gain versus pattern frequency (support) for three UCI data sets. A theoretical
upper bound on information gain ( IG UpperBound ) is also shown. Source: Adapted from Cheng,
Yan, Han, and Hsu [CYHH07].
incorporated into this step to weed out redundant patterns. The data set D is trans-
formed to D 0 , where the feature space now includes the single features as well as the
selected frequent patterns, F S .
3. Learning of classification model: A classifier is built on the data set D 0 . Any learning
algorithm can be used as the classification model.
The general framework is summarized in Figure 9.13(a), where the discriminative
patterns are represented by dark circles. Although the approach is straightforward,
we can encounter a computational bottleneck by having to first find all the frequent
patterns, and then analyze each one for selection. The amount of frequent patterns found
can be huge due to the explosive number of pattern combinations between items.
 
Search WWH ::




Custom Search