Information Technology Reference
In-Depth Information
To train the CAL using the selected features from the training data, we used
three of the most commonly used CALs, KNN, naive Bayes (NB), and support
vector machine (SVM). For more information regarding FS methods, feature RSs,
and CALs, see [19]. We used the RapidMiner 4.0 [20] implementations of these
CALs to train and test the classification model. The experimental parameters are
summarized in Table 2.
3
Results and Discussion
Conducting experiments with the parameters in Table 2, we obtained the data in
Table 3.
Table 3 Experimental results
Classifier
Grams Average
Minimum
Maximum
1
60.35
39.25
IG, LTC, 150
75
CHI, TFiDF, 200
2
50.38
25.88
CHI, LTC, 200
66.45
IG, Boolean, 200
NB
3
38.76
21.05
CHI, LTC, 200
51.32
DF, Boolean, 200
4
32.77
18.86
CHI, LTC, 200
42.98
GSS, Boolean, 200
1
49.14
40.13
DF, TFiDF, 100
58.77
CHI, LTC, 50
2
41.35
28.29
IG, Boolean, 50
51.54
CHI, LTC, 50
KNN
3
37.63
33.33
DF, TFiDF, 50
41.89
IG, LTC, 150
4
33.54
30.48
IG, Boolean, 50
36.4
GSS, LTC, 150
1
72.35
67.54
DF, TFiDF, 50
75.44
IG, LTC, 200
2
58.73
49.78
DF, Boolean, 50
65.13
IG, LTC, 200
SVM
CHI, Boolean, 50
3
41.81
35.53
47.37
IG, TFiDF, 200
4
35.07
31.58
CHI,Boolean,50
38.6
GSS, LTC, 100
The table lists the average CA for each gram number of each classifier, the
minimum and maximum CA for that classifier, and the combination of FS method,
RS, and number of terms that has produced the respective value. The data suggest
that, on average, for all gram numbers, the CA of SVM is greater than that of NB,
followed by that of KNN. In addition, the data suggest that greater CA is achieved
when single words are used as a feature, and that CA declines by 17%, on average,
when the gram number increases.
Notably, while SVM achieved greater CA, on average, and the best CA using
single words, NB exhibited greater CA for 2-grams, 3-grams, and 4-grams. The
data show that the best CA was achieved when the number of terms was 200, the
maximum number of terms used in our experiments. According to the data, NB
worked well with Boolean representation (three of the best results were achieved
Search WWH ::




Custom Search