Information Technology Reference
In-Depth Information
Fig. 10.7. Details ofprecision and recall for categories in Reuters-21578
the better approach in handling imbalanced data. Among various TFFVs, CBTW 1
claims theleading performance in both MCV1 and Reuters-21578. However, the
approachbased ontheodds ratio is notmuch superiorto TFIDF. With respectto
theevaluationbased onthe meritsof CBTW 1 againsttheothers, it is not surprising
toseethat CBTW 1 still takes thelead.It manages to perform better than the ap-
proaches based oninformation gain and chi-square in the T-testwhere the absolute
difference of F 1 values isconsidered.Furthermore, CBTW 1 always achieves better
results than RF in both data sets. This demonstrates thecontribution of A/C in
handling imbalanced data sets. Finally, thestrengthsofinformation gain,chi-square
and correlation coe cient shown in the tests arecompatible with what is in liter-
ature[14, 40, 47].In general, the more minor categories the datasetpossesses, the
Search WWH ::




Custom Search