Databases Reference
In-Depth Information
CARs for each class are mined, based on the existence of 50 potential signif-
icant CARs for each class in
), the average accuracy was found as 76.79%.
The second set of evaluations undertaken used a confidence threshold
value of 50%, a set of decreasing support threshold values from 1 to 0.03%,
and the letter recognition dataset. The “large” letter recognition dataset
(
letRecog.D106.N20000.C26
), comprises 20,000 records and 26 pre-defined
classes. For the experiment the dataset has been discretised and normalised
into 106 binary categories. From the experiment it can be seen that a re-
lationship exists between: the selected value of support threshold (
σ
or
min.support
), the number of generated CARs (—
R
—), the accuracy of clas-
sification (
Accy
), and the time in seconds spent on computation (
Time
).
Clearly,
↓ σ ⇒↑|R|⇒
(
↑ Accy ∧↑Time
).
Table 2 demonstrate that with a 50% confidence threshold and a value
of 1 as the value for
k
(only the most significantly CAR for each class is
mined in
R
), the proposed rule mining approach (its randomised fashion)
performs well with respect to both accuracy of classification and e
ciency
of computation. When applying the “one-by-one” rule mining approach, as
σ
decreasing from 1 to 0.03%,
|R|
(before mining the “best
k
” rules) is increased
|R|
(after mining the “best
k
” rules and re-ordering
all rules) is increased from 167 to 6,367. Consequently accuracy has been
increased from 29.41 to 48.22%, and
Time
(the time spent on mining the
k
significant rules) has been increased from 0.08 to 12.339 s. In comparison
when applying the proposed randomised rule mining approach with a value of
50 as the value for
k
(there exist 50 potential significant rules for each class in
|R|
from 149 to 6,341; and
|R|
(before mining the “best
k
” rules)
), as
σ
decreasing from 1 to 0.03%,
|R|
(after mining the “best
k
” rules and
is increased from 149 to 6,341; and
|R|
Tabl e 2 .
Computational e
ciency and classification accuracy (
α
= 50%)
Dataset
One-by-one approach
Randomised selector
k
=1
k
=1,
k
=50
letRecog
D106.
Rule
Rule
Time
Accuracy
Rule
Rule
Time Accuracy
N20000.
number number
(s)
(%)
number number
(s)
(%)
C26
(before)
(after)
(before)
(after)
1
149
167
0
.
080
29.41
149
166
0.160
29.60
0.75
194
212
0
.
110
29.94
194
211
0.160
29.92
0.50
391
415
0
.
200
35.67
391
411
0.251
35.78
0.25
1118
1143
1
.
052
40.36
1118
1139
0.641
41.26
0.10
2992
3018
4
.
186
44.95
2992
3016
0.722
45.18
0.09
3258
3284
4
.
617
45.21
3258
3282
1.913
45.42
0.08
3630
3656
6
.
330
45.88
3630
3655
2.183
45.43
0.07
3630
3656
6
.
360
45.88
3630
3656
2.163
46.02
0.06
4366
4392
5
.
669
46.70
4366
4391
2.754
46.45
0.05
4897
4923
7
.
461
47.28
4897
4922
3.235
47.65
0.04
5516
5542
9
.
745
47.67
5516
5542
3.526
47.53
0.03
6341
6367
12
.
339
48.22
6341
6365
4.296
48.79