Mining Efficiently Significant Classification Association Rules - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

Tabl e 1 . Classification accuracy ( α = 50%, σ =1%)

Datasets

CSA

One-by-one approach Randomised selector

k =1

k =10

k =1

k =10

k =5

k =50

adult.D97.N48842.C2

80.83

83.87

76.88

81.95

81.85

anneal.D73.N898.C6

91.09

89.31

91.09

90.20

91.31

auto.D137.N205.C7

61.76

64.71

59.80

64.71

58.82

breast.D20.N699.C2

89.11

87.68

89.11

90.83

92.55

connect4.D129.N67557.C3

65.83

66.78

65.87

66.34

66.05

cylBands.D124.N540.C2

65.93

69.63

63.70

67.41

67.78

flare.D39.N1389.C9

84.44

84.01

84.29

84.44

84.29

glass.D48.N214.C7

58.88

64.49

52.34

64.49

heart.D52.N303.C5

58.28

56.29

60.26

59.60

hepatitis.D56.N155.C2

68.83

66.23

75.32

72.72

horseColic.D85.N368.C2

72.83

77.72

80.43

81.52

ionosphere.D157.N351.C2

85.14

84.00

90.29

88.57

93.14

iris.D19.N150.C3

97.33

led7.D24.N3200.C10

68.38

62.94

68.38

68.89

69.94

letRecog.D106.N20000.C26

30.29

29.41

31.19

29.36

30.92

mushroom.D90.N8124.C2

99.21

98.45

98.82

99.21

98.45

nursery.D32.N12960.C5

80.35

76.85

76.17

80.20

81.11

pageBlocks.D46.N5473.C5

90.97

91.74

90.97

91.74

90.97

pima.D38.N768.C2

73.18

73.44

soybean-large.D118.N683.C19

85.92

81.23

86.51

84.46

84.75

ticTacToe.D29.N958.C2

71.61

68.48

72.03

71.19

73.28

waveform.D101.N5000.C3

61.60

58.92

55.96

59.52

57.20

wine.D68.N178.C3

53.93

83.15

71.91

83.15

85.39

zoo.D42.N101.C7

76.00

86.00

78.00

90.00

86.00

Average

73.82

75.29

74.03

76.81

76.79

(only the most significantly CAR for each class is mined) and applying the

“one-by-one” rule mining approach, the average accuracy of classification

throughout the 24 datasets is 75.29%. When substituting the value of 1 by a

value of 10 (the best ten significant CARs for each class are identified), the av-

erage accuracy, using the “one-by-one” rule mining approach, is 74.03%. Note

that the average accuracies are higher than the average accuracy of classifica-

tion obtained by the well-established CSA ordering approach, which is 73.82%.

Furthermore when dealing with the randomised selector based rule mining ap-

proach, and choosing a value of 1 as the value for k and a value of 5 as the

value for k (only the most significantly CAR for each class is mined, based on

the existence of five potential significant CARs for each class in

), the aver-

age accuracy throughout the 24 datasets can be obtained as 76.81%. Note that

in the randomised experiment process, we always run several tests (i.e., 8-10

tests) for each dataset, and catch the best result. When substituting the value

of 1 by a value of 10, and the value of 5 by a value of 50 (the best ten significant

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home