Extracting Product Features and Opinions from Reviews - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

SO label extraction ); distinguishing between opinion and non-opinion phrases

in the context of known features and sentences ( opinion phrase extraction );

finding the correct polarity of extracted opinion phrases in the context of

known features and sentences ( opinion phrase polarity extraction ).

We first ran opine on 13,841 sentences and 538 previously extracted fea-

tures. opine searched for a SO label assignment for 1756 different words in

the context of the given features and sentences. We compared opine against

two baseline methods, PMI++ and Hu++ .

PMI++ is an extended version of [1]'s method for finding the SO label

of a word or a phrase. For a given (word, feature, sentence) tuple, PMI++

ignores the sentence, generates a phrase containing the word and the feature

( e.g. , “clean room”) and finds its SO label using PMI statistics. If unsure of

the label, PMI++ finds the orientation of the potential opinion word instead.

The search engine queries use domain-specific keywords ( e.g. , “clean room”

+ “hotel”), which are dropped if they lead to low counts. PMI++ also uses

morphology information ( e.g. , wonderful and wonderfully are likely to have

similar semantic orientation labels).

Hu++ is a WordNet-based method for finding a word's context-

independent semantic orientation. It extends Hu's adjective labeling method

[2] in order to handle nouns, verbs and adverbs and in order to improve cov-

erage. Hu's method starts with two sets of positive and negative words and

iteratively grows each one by including synonyms and antonyms from Word-

Net. The final sets are used to predict the orientation of an incoming word.

Hu++ also makes use of WordNet IS-A relationships ( e.g. , problem IS-A

di culty ) and morphology information.

Experiments: Word SO Labels

On the task of finding SO labels for words in the context of given features and

review sentences, opine obtains higher precision than both baseline methods

at a small loss in recall with respect to PMI++ . As described below, this

result is due in large part to opine's ability to handle context-sensitive opinion

words.

We randomly selected 200 (word, feature, sentence) tuples for each word

type (adjective, adverb, etc.) and obtained a test set containing 800 tuples.

Two annotators assigned positive, negative and neutral labels to each tuple

(the inter-annotator agreement was 78%). We retained the tuples on which

the annotators agreed as the gold standard. We ran PMI++ and Hu++ on

the test data and compared the results against opine's results on the same

data.

In order to quantify the benefits of each of the three steps of our method

for finding SO labels, we also compared opine with a version which only finds

SO labels for words and a version which finds SO labels for words in the

context of given features, but doesn't take into account given sentences. We

have learned from this comparison that opine's precision gain over PMI++

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home