Extracting Product Features and Opinions from Reviews - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

ment of pos and neg labels to phrases which were found to be opinions (that

is, not neutral ) after the word SO label assignment stage is completed.

We compared opine with PMI++ and Hu++ on the tasks of interest.

We found that opine had the highest precision on both tasks at a small loss in

recall with respect to PMI++ . opine's ability to identify a word's SO label

in the context of a given feature and sentence allows the system to correctly

extract opinions expressed by words such as “big” or “small,” whose semantic

orientation varies based on context.

opine's performance is negatively affected by a number of factors: pars-

ing errors lead to missed candidate opinions and incorrect opinion polarity

assignments; other problems include sparse data (in the case of infrequent

opinion words) and complicated opinion expressions ( e.g. , nested opinions,

conditionals, subjunctive expressions).

2.3.7 Ranking Opinion Phrases

opine clusters opinions in order to identify the properties to which they refer.

Given an opinion cluster A corresponding to some property, opine ranks its

elements based on their relative strength . The probabilities computed at the

end of the relaxation-labeling scheme generate an initial opinion ranking.

Table 2.11. Lexical Patterns Used to Derive Opinions' Relative Strength.

a, ( ∗ ) even b a, ( ∗ ) not b

a, ( ∗ ) virtually b a, ( ∗ ) almost b

a, ( ∗ ) near b a, ( ∗ ) close to b

a, ( ∗ ) quite b a, ( ∗ ) mostly b

In order to improve this initial ranking, opine uses additional Web-derived

constraints on the relative strength of phrases. As pointed out in [8], patterns

such as “ a 1 , (*) even a 2 ” are good indicators of how strong a 1 is relative to

a 2 . To our knowledge, the sparse data problem mentioned in [8] has so far

prevented such strength information from being computed for adjectives from

typical news corpora. However, the Web allows us to use such patterns in

order to refine our opinion rankings. opine starts with the pattern mentioned

before and bootstraps a set of similar patterns (see Table 2.11). Given a cluster

A , queries which instantiate such patterns with pairs of cluster elements are

used to derive constraints such as:

c 1 =( strength ( deafening ) > strength ( loud )),

c 2 =( strength ( spotless ) > strength ( clean )).

opine also uses synonymy and antonymy-based constraints, since syn-

onyms and antonyms tend to have similar strength:

c 3 =( strength ( clean )= strength ( dirty )).

The set S of such constraints induces a constraint satisfaction problem

(CSP) whose solution is a ranking of the cluster elements affected by S (the

Search WWH ::

Custom Search

Home