Intelligent News Aggregator for German with Sentiment Analysis - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

to find the most suitable feature set for both the subjectivity detection and polarity

classification task. We train and evaluate our approach on a human-annotated corpus

of German quotations. The corpus consists of 742 neutral, 71 positive, and 38 negative

quotations. It can be made available for research purposes after signing an agreement.

1.4.2 Related Work

Related research on sentiment analysis varies from simple lexicon-based approaches

looking up words in opinion lexicons to supervised approaches exploiting linguistic

features and enhanced machine learning algorithms. The main part concentrates on

the classification of text as either positive, negative or neutral toward a specific

entity or topic explicitly mentioned in the text.

Sentiment analysis treats texts at different levels. There is work examining entire

document texts like entire reviews or news articles, attempting to predict the overall

sentiment of a document text [ 37 , 49 ]. But there is also work that performs sentiment

analysis at statement level [ 5 , 18 , 46 ], sentence level [ 23 , 33 , 43 , 54 ] or even phrase

level [ 49 ]. Often, sentiment analysis work on reviews also aims at extracting product

properties and the opinions toward these properties, which is called aspect-oriented

sentiment analysis [ 21 ].

Sentiment analysis has also been applied to different text types. A great part of

the work examines customer reviews, like product [ 21 , 49 ]ormovie[ 37 ]reviews.

Since reviews are meant to share experiences and report opinions, they contain many

subjective text parts and are therefore predestined for sentiment analysis. Yet, reviews

can also contain objective parts summarizing the properties of the reviewed entities.

Regarding movie reviews, one challenge is to separate plot information, which itself

may be characterized as positive or negative, from opinions toward the movie. All

work treating customer reviews must handle challenges arising from user-generated

content such as potential spelling mistakes and grammatical errors.

Early work in classifying product reviews used lexicon-based techniques together

with natural language processing algorithms in order to create opinion summariza-

tion. Hu and Liu [ 21 ] propose a three-stage approach to aspect-based opinion sum-

marization. They first search for product features in customer reviews by applying

association mining with some pruning. Then, the authors determine the polarity of

sentences mentioning the features. Whether a sentence has to be classified as posi-

tive or negative results from the orientation of the individual opinions words (adjec-

tives) in the sentence that is summed up to an overall orientation. The orientation of

opinion words is pre-calculated based on a list of seed adjectives and the applica-

tion of WordNet's information on synonyms and antonyms. Similar to Hu and Liu,

Turney [ 49 ] categorizes product reviews in either 'recommended' or not 'recom-

mended' by calculating the average sentiment orientation of the review's phrases.

Turney calculates the orientation of phrases containing adjectives and adverbs by

determining the mutual information between a phrase and the words “excellent” and

“poor” and subtracting both values to obtain a final sentiment orientation score.

Search WWH ::

Custom Search

Home