Information Technology Reference
In-Depth Information
The so far discussed customer reviews are predominantly medium or long texts.
With the mass distribution and utilization of social media services like Twitter or
Facebook in the recent years, a part of the sentiment analysis work shifts toward the
analysis of short texts generated by users. Because of the language used in such texts,
new challenges arise for the task. Often, users write their texts colloquially and they
do not care about spelling and punctuation. In addition, the texts mostly are very
short and comprise phrases rather than complete sentences. Considering Twitter, a
short message must not exceed 140 characters.
As one of the first, Go et al. [ 18 ] classify English-language tweets according to a
query as either positive or negative. They adopt a supervised approach using diverse
classifiers including a Naive Bayes, a Maximum Entropy, and a Support Vector
Machine (SVM) classifier. In order to train the classifiers, the authors propose using
tweets containing positive or negative emoticons (mapped to ':(' and ':)') as noisy
labeled training data. The authors explore a range of standard text classification fea-
tures like word uni- and bigrams and part-of-speech tags for representing the tweets.
After having evaluated their approach on manually tagged tweets from different cat-
egories (177 negative and 182 positive tweets independent of emoticons), Go et al.
conclude that the automatically created training dataset is suitable for training the
examined algorithms, which solve the task reasonably. Using a combination of word
uni- and bigrams the Maximum Entropy classifier achieves an accuracy of 83 %. Yet,
there are no large differences between the classifiers and feature sets.
In comparison to customer reviews, news articles may express opinions less
explicitly. Since journalists (ought to) write objectively and avoid emotional language,
the identification of the implied opinions is especially challenging. In addition, the
opinion holder must be extracted. Different from customer reviews, it is not the
author's opinion expressed in the news article but the opinion of other people and
organizations the article deals with. In 2006 Kim and Hovy [ 23 ] approached the
task of opinion mining in English news articles by proposing a four-stage system.
The authors extract opinions, determine the opinion topic, and assign an opinion
holder by applying semantic role labeling. The authors separate subjective from
objective sentences, perform semantic role labeling utilizing opinion-related frames
and frame elements from FrameNet, 17 and choose the opinion target and holder out
of the semantic roles. Finally, the extracted opinion triples consisting of the holder,
topic, and opinion are stored in a database.
The work proposed by Nakagawa et. al [ 33 ] addresses sentiment classification
at sentence level. The authors use conditional random fields with hidden variables,
representing polarity of dependency sub-trees, to infer the polarity of the entire
subjective sentences. The approach was evaluated on English and Japanese opinion
texts and is promising. Among others, it was evaluated on Japanese news articles
with an accuracy of 83 %, which shows its effectiveness on this text type. However,
the work bases on subjective sentences and skips the task of subjectivity detection.
Strongly related to our work is the work of Balahur et al. [ 2 - 4 ]. The authors apply
sentiment analysis to news articles. Although the team mainly explores approaches
17
https://framenet.icsi.berkeley.edu/fndrupal/ .
Search WWH ::




Custom Search