Intelligent News Aggregator for German with Sentiment Analysis - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

The so far discussed customer reviews are predominantly medium or long texts.

With the mass distribution and utilization of social media services like Twitter or

Facebook in the recent years, a part of the sentiment analysis work shifts toward the

analysis of short texts generated by users. Because of the language used in such texts,

new challenges arise for the task. Often, users write their texts colloquially and they

do not care about spelling and punctuation. In addition, the texts mostly are very

short and comprise phrases rather than complete sentences. Considering Twitter, a

short message must not exceed 140 characters.

As one of the first, Go et al. [ 18 ] classify English-language tweets according to a

query as either positive or negative. They adopt a supervised approach using diverse

classifiers including a Naive Bayes, a Maximum Entropy, and a Support Vector

Machine (SVM) classifier. In order to train the classifiers, the authors propose using

tweets containing positive or negative emoticons (mapped to ':(' and ':)') as noisy

labeled training data. The authors explore a range of standard text classification fea-

tures like word uni- and bigrams and part-of-speech tags for representing the tweets.

After having evaluated their approach on manually tagged tweets from different cat-

egories (177 negative and 182 positive tweets independent of emoticons), Go et al.

conclude that the automatically created training dataset is suitable for training the

examined algorithms, which solve the task reasonably. Using a combination of word

uni- and bigrams the Maximum Entropy classifier achieves an accuracy of 83 %. Yet,

there are no large differences between the classifiers and feature sets.

In comparison to customer reviews, news articles may express opinions less

explicitly. Since journalists (ought to) write objectively and avoid emotional language,

the identification of the implied opinions is especially challenging. In addition, the

opinion holder must be extracted. Different from customer reviews, it is not the

author's opinion expressed in the news article but the opinion of other people and

organizations the article deals with. In 2006 Kim and Hovy [ 23 ] approached the

task of opinion mining in English news articles by proposing a four-stage system.

The authors extract opinions, determine the opinion topic, and assign an opinion

holder by applying semantic role labeling. The authors separate subjective from

objective sentences, perform semantic role labeling utilizing opinion-related frames

and frame elements from FrameNet, 17 and choose the opinion target and holder out

of the semantic roles. Finally, the extracted opinion triples consisting of the holder,

topic, and opinion are stored in a database.

The work proposed by Nakagawa et. al [ 33 ] addresses sentiment classification

at sentence level. The authors use conditional random fields with hidden variables,

representing polarity of dependency sub-trees, to infer the polarity of the entire

subjective sentences. The approach was evaluated on English and Japanese opinion

texts and is promising. Among others, it was evaluated on Japanese news articles

with an accuracy of 83 %, which shows its effectiveness on this text type. However,

the work bases on subjective sentences and skips the task of subjectivity detection.

Strongly related to our work is the work of Balahur et al. [ 2 - 4 ]. The authors apply

sentiment analysis to news articles. Although the team mainly explores approaches

17

https://framenet.icsi.berkeley.edu/fndrupal/ .

Smart Information Systems: Computational Intelligence for Real-Life Applications

Search WWH ::

Custom Search

Home