Mining Efficiently Significant Classification Association Rules - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

CARs for each class are mined, based on the existence of 50 potential signif-

icant CARs for each class in

), the average accuracy was found as 76.79%.

The second set of evaluations undertaken used a confidence threshold

value of 50%, a set of decreasing support threshold values from 1 to 0.03%,

and the letter recognition dataset. The “large” letter recognition dataset

( letRecog.D106.N20000.C26 ), comprises 20,000 records and 26 pre-defined

classes. For the experiment the dataset has been discretised and normalised

into 106 binary categories. From the experiment it can be seen that a re-

lationship exists between: the selected value of support threshold ( σ or

min.support ), the number of generated CARs (—

—), the accuracy of clas-

sification ( Accy ), and the time in seconds spent on computation ( Time ).

Clearly, ↓ σ ⇒↑|R|⇒ ( ↑ Accy ∧↑Time ).

Table 2 demonstrate that with a 50% confidence threshold and a value

of 1 as the value for k (only the most significantly CAR for each class is

mined in

), the proposed rule mining approach (its randomised fashion)

performs well with respect to both accuracy of classification and e ciency

of computation. When applying the “one-by-one” rule mining approach, as σ

decreasing from 1 to 0.03%,

|R|

(before mining the “best k ” rules) is increased

|R|

(after mining the “best k ” rules and re-ordering

all rules) is increased from 167 to 6,367. Consequently accuracy has been

increased from 29.41 to 48.22%, and Time (the time spent on mining the

k significant rules) has been increased from 0.08 to 12.339 s. In comparison

when applying the proposed randomised rule mining approach with a value of

50 as the value for k (there exist 50 potential significant rules for each class in

|R|

from 149 to 6,341; and

|R|

(before mining the “best k ” rules)

), as σ decreasing from 1 to 0.03%,

|R|

(after mining the “best k ” rules and

is increased from 149 to 6,341; and

|R|

Tabl e 2 . Computational e ciency and classification accuracy ( α = 50%)

Dataset

One-by-one approach

Randomised selector

k =1

k =1, k =50

letRecog

D106.

Rule

Time

Accuracy

Rule

Time Accuracy

N20000.

number number

(s)

(%)

number number

(s)

(%)

C26

(before)

(after)

(before)

(after)

149

167

0 . 080

29.41

149

166

0.160

29.60

0.75

194

212

0 . 110

29.94

194

211

0.160

29.92

0.50

391

415

0 . 200

35.67

391

411

0.251

35.78

0.25

1118

1143

1 . 052

40.36

1118

1139

0.641

41.26

0.10

2992

3018

4 . 186

44.95

2992

3016

0.722

45.18

0.09

3258

3284

4 . 617

45.21

3258

3282

1.913

45.42

0.08

3630

3656

6 . 330

45.88

3630

3655

2.183

45.43

0.07

3630

3656

6 . 360

45.88

3630

3656

2.163

46.02

0.06

4366

4392

5 . 669

46.70

4366

4391

2.754

46.45

0.05

4897

4923

7 . 461

47.28

4897

4922

3.235

47.65

0.04

5516

5542

9 . 745

47.67

5516

5542

3.526

47.53

0.03

6341

6367

12 . 339

48.22

6341

6365

4.296

48.79

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home