Handling of Imbalanced Data in Text Classification: Category-Based Term Weights - Natural Language Processing and Text Mining - page 182

Information Technology Reference

In-Depth Information

Fig. 10.5. Performance of16weightingschemes over 7 minor categories in Reuters-

21578, whereeach ofthem only occupies between 1% and less than 5% ofReuters-

21578

both the S-test and T-test, we actually conducttwosetsoftests. One is to test

all major schemes againstTFIDF and another one is to test CBTW 1 againstthe

major schemes. While thefirstaims to assess thegoodness of schemes in the form

ofTFFVs, the second tests whether CBTWsgenerateevenbetterresults.

Table 10.8. Details ofS-test onMCV1, P ( Z> = k )= p -Value, where two F 1 values

are thesame if their difference is notmore than 0.01

All vs. TFIDF

CBTW 1 vs. All

Test

n K

p -Value

Test

n K

p -Value

CC

16 14 2.090E-03 TFIDF 18 16 6.561E-04

ChiS

18 13 4.813E-02 CC

16 12 3.841E-02

IG

18 14 1.544E-02 ChiS

17 14 6.363E-03

OddsR

15 11 5.923E-02 IG

16 11 1.051E-01

RF

18 18 3.815E-06 OddsR

18 16 6.561E-04

CBTW 1 18 16 6.561E-04 RF

16 13 1.064E-02

Next Page

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home