Information Technology Reference
In-Depth Information
Table 10.9.
Details of T-test on MCV1 , where
alpha
= 0.001 and degree of freedom
=34
All vs. TFIDF,
alpha
=0.001 CBTW
1
vs. All,
alpha
=0.001
Test
t
-Value
t
-Critical
Test
t
-Value
t
-Critical
CC
21.509
3.354
TFIDF
30.171
3.354
ChiS
19.634
CC
10.688
IG
25.879
ChiS
13.465
OddsR
8.343
IG
6.571
RF
17.038
OddsR
24.003
CBTW
1
30.171
RF
13.368
Mean
20.429
Mean
16.378
StdDev
7.536
StdDev
8.882
Table 10.10.
Details of S-test on Reuters-21578,
P
(
Z>
=
k
)=
p
-Value, where two
F
1
values are the same if their difference is not more than 0.01
All vs. TFIDF
CBTW
1
vs. All
Test
n K
p
-Value
Test
n K
p
-Value
CC
12 11 3.174E-03 TFIDF 12 11 3.174E-03
ChiS
12 11 3.174E-03 CC
12 8 1.938E-01
IG
11 10 5.859E-03 ChiS
11 7 2.744E-01
OddsR
12 8 1.938E-01 IG
12 7 3.872E-01
RF
12 9 7.300E-02 OddsR
12 10 1.929E-02
CBTW
1
12 11 3.174E-03 RF
13 10 4.614E-02
Table 10.11.
Details of T-test on Reuters-21578 , where
alpha
= 0.001 and degree
of freedom = 34
All vs. TFIDF,
alpha
=0.001 CBTW
1
vs. All,
alpha
=0.001
Test
t
-Value
t
-Critical
Test
t
-Value
t
-Critical
CC
19.571
3.467
TFIDF
28.893
3.467
ChiS
25.164
CC
8.587
IG
24.352
ChiS
3.682
OddsR
1.692
IG
3.993
RF
11.338
OddsR
22.619
CBTW
1
28.893
RF
15.139
Mean
18.501
Mean
13.819
StdDev
10.214
StdDev
10.326
better overall performance can be achieved if CBTW
1
is chosen as the weighting
scheme.
10.7 Conclusion
Handling of imbalanced data sets in TC has become an emerging challenge. In this
chapter, we introduce a new weighting scheme which is generally formulated as
Search WWH ::
Custom Search