Information Technology Reference
In-Depth Information
patterns to the quotation's text and augment our feature vector by adding a negated
form of the SentiWS term in form of NOT_SentiWS_term to the vector. We also
invert the SentiWS value for the negated term, if the weighting scheme requires this.
In the example above we add NOT_gut (not good) to our term vector and regard
the term as negative for calculating our aggregated features.
Valence Shifters . Valence shifters are words or phrases like “nicht” (not), “höchst”
(extremely), or “weniger” (less), that change the intensity or polarity of other lexical
items. We distinguish between three types of valence shifters: negations, diminishers
and intensifiers. Previous work examined the effect of valence shifters in sentiment
classification of movie reviews and concluded that incorporating valence shifters
slightly increases classification accuracy [ 22 ]. In order to create our features, we
exploit a list of around 100 valence shifters derived from the MLSA Corpus, a multi-
layered reference corpus for German-language sentiment analysis [ 9 ]. The corpus
consists of three layers with sentiment annotations at different granularity levels.
Layer 2 provides polarity related annotations for words and phrases. At phrase-level
the text spans are labeled as positive, negative, bipolar, and neutral. Words are labeled
in addition as diminishers, intensifiers, and shifters (negations). With the aid of these
annotations we compile feature vectors in the form of bag-of-valence-shifters and
derived features that accumulate the three types of valence shifters.
Discourse Markers . Discourse markers are words or phrases that connect sen-
tences or sentence parts and thereby express the semantic relations between them.
Examples are “weil, aber, abgesehen davon dass, sogar, dennoch…” (because, but,
apart from this, even, however…). The usage of discourse markers may influence
the orientation or intensity of sentiments like in the quotation “ Wir sind zufrieden
mit dem Stand der Dinge, aber wir wollen mehr ”, sagte Vettel. (“We are happy with
the situation, but we want more”, Vettel said.) In our approach we search quotations
for discourse markers from a predefined list. The list is derived from the online lex-
icon for German grammar 23 of the “Institut für Deutsche Sprache” 24 (IDS, Institute
for German Language). It contains around 350 discourse markers of different types.
The resulting feature vector encompasses all discourse markers making no distinc-
tion between the types. We assign each marker a value (occurrence flag and term
frequency) and encode whether the quotation contains discourse markers and how
many.
All Features . Table 1.4 provides an overview of all feature groups that we use in
our sentiment analysis approach. A feature vector containing all feature combinations
consists of 160 K entries. If we include text position information into the feature
vector the number of entries rises to almost 650 K. The relative big dictionaries, in
comparison to the short quotations, result in very sparse feature vectors that we have
to deal with.
23
http://hypermedia.ids-mannheim.de/index.html .
24
http://www1.ids-mannheim.de/start/ .
Search WWH ::




Custom Search