Information Technology Reference
In-Depth Information
Fig. 10.7.
Details ofprecision and recall for categories in Reuters-21578
the better approach in handling imbalanced data. Among various TFFVs, CBTW
1
claims theleading performance in both MCV1 and Reuters-21578. However, the
approachbased ontheodds ratio is notmuch superiorto TFIDF. With respectto
theevaluationbased onthe meritsof CBTW
1
againsttheothers, it is not surprising
toseethat CBTW
1
still takes thelead.It manages to perform better than the ap-
proaches based oninformation gain and chi-square in the T-testwhere the absolute
difference of
F
1
values isconsidered.Furthermore, CBTW
1
always achieves better
results than RF in both data sets. This demonstrates thecontribution of
A/C
in
handling imbalanced data sets. Finally, thestrengthsofinformation gain,chi-square
and correlation coe
cient shown in the tests arecompatible with what is in liter-
ature[14, 40, 47].In general, the more minor categories the datasetpossesses, the
Search WWH ::
Custom Search