Information Technology Reference
In-Depth Information
mixed
1%
negative
5%
positive
8%
neutral
86%
Fig. 1.6 The corpus for evaluating the sentiment analysis approach is highly unbalanced. It consists
of 742 neutral, 71 positive, 38 negative, and 10 mixed quotations
have to deal with a highly unbalanced corpus. We discard the 1 % mixed quotations,
because we do not aim at the classification of such quotations.
1.4.6 Evaluation
We conduct our experiments on a human-annotated corpus of 851 quotations tagged
as positive, negative,orneutral. We first evaluate each classifier of our two-
stage approach separately and then assess the performance of the overall sentiment
classification. In our experiments we first examine our sentiment features individu-
ally and then if combining them helps to solve the task of subjectivity and polarity
classification. 26 We measure the effectiveness of our approach according to the pre-
cision, recall, and harmonic mean between precision and recall, the F1-score. We
consider all classes equally important, determine the evaluation scores for each class
separately, and then macro-average the scores across the classes. Our evaluation is
performed as tenfold cross-validation where 90 % of the data is used to train the
classifier and the remaining 10 % to test it in each evaluation run. Within the 10 folds
the distribution of quotations is pertained. We normalize the feature values to fit into
the interval of [0
,
1]. In each run we perform a nested tenfold cross-validation to find
26 We skip the evaluation of our target extraction solution because of the lack of text anchors for
targets in our corpus. The corpus contains only a few target annotations because in most cases the
targets are abstract topics or expression of sentiments rather than entities or nouns and therefore the
annotators could not mark them within the quotation.
Search WWH ::




Custom Search