Information Technology Reference
In-Depth Information
Table 10.9. Details of T-test on MCV1 , where alpha = 0.001 and degree of freedom
=34
All vs. TFIDF, alpha =0.001 CBTW 1 vs. All, alpha =0.001
Test
t -Value
t -Critical
Test
t -Value
t -Critical
CC
21.509
3.354
TFIDF
30.171
3.354
ChiS
19.634
CC
10.688
IG
25.879
ChiS
13.465
OddsR
8.343
IG
6.571
RF
17.038
OddsR
24.003
CBTW 1
30.171
RF
13.368
Mean
20.429
Mean
16.378
StdDev
7.536
StdDev
8.882
Table 10.10. Details of S-test on Reuters-21578, P ( Z> = k )= p -Value, where two
F 1 values are the same if their difference is not more than 0.01
All vs. TFIDF
CBTW 1 vs. All
Test
n K
p -Value
Test
n K
p -Value
CC
12 11 3.174E-03 TFIDF 12 11 3.174E-03
ChiS
12 11 3.174E-03 CC
12 8 1.938E-01
IG
11 10 5.859E-03 ChiS
11 7 2.744E-01
OddsR
12 8 1.938E-01 IG
12 7 3.872E-01
RF
12 9 7.300E-02 OddsR
12 10 1.929E-02
CBTW 1 12 11 3.174E-03 RF
13 10 4.614E-02
Table 10.11. Details of T-test on Reuters-21578 , where alpha = 0.001 and degree
of freedom = 34
All vs. TFIDF, alpha =0.001 CBTW 1 vs. All, alpha =0.001
Test
t -Value
t -Critical
Test
t -Value
t -Critical
CC
19.571
3.467
TFIDF
28.893
3.467
ChiS
25.164
CC
8.587
IG
24.352
ChiS
3.682
OddsR
1.692
IG
3.993
RF
11.338
OddsR
22.619
CBTW 1
28.893
RF
15.139
Mean
18.501
Mean
13.819
StdDev
10.214
StdDev
10.326
better overall performance can be achieved if CBTW 1
is chosen as the weighting
scheme.
10.7 Conclusion
Handling of imbalanced data sets in TC has become an emerging challenge. In this
chapter, we introduce a new weighting scheme which is generally formulated as
 
Search WWH ::




Custom Search