Information Technology Reference
In-Depth Information
SO label extraction ); distinguishing between opinion and non-opinion phrases
in the context of known features and sentences ( opinion phrase extraction );
finding the correct polarity of extracted opinion phrases in the context of
known features and sentences ( opinion phrase polarity extraction ).
We first ran opine on 13,841 sentences and 538 previously extracted fea-
tures. opine searched for a SO label assignment for 1756 different words in
the context of the given features and sentences. We compared opine against
two baseline methods, PMI++ and Hu++ .
PMI++ is an extended version of [1]'s method for finding the SO label
of a word or a phrase. For a given (word, feature, sentence) tuple, PMI++
ignores the sentence, generates a phrase containing the word and the feature
( e.g. , “clean room”) and finds its SO label using PMI statistics. If unsure of
the label, PMI++ finds the orientation of the potential opinion word instead.
The search engine queries use domain-specific keywords ( e.g. , “clean room”
+ “hotel”), which are dropped if they lead to low counts. PMI++ also uses
morphology information ( e.g. , wonderful and wonderfully are likely to have
similar semantic orientation labels).
Hu++ is a WordNet-based method for finding a word's context-
independent semantic orientation. It extends Hu's adjective labeling method
[2] in order to handle nouns, verbs and adverbs and in order to improve cov-
erage. Hu's method starts with two sets of positive and negative words and
iteratively grows each one by including synonyms and antonyms from Word-
Net. The final sets are used to predict the orientation of an incoming word.
Hu++ also makes use of WordNet IS-A relationships ( e.g. , problem IS-A
di culty ) and morphology information.
Experiments: Word SO Labels
On the task of finding SO labels for words in the context of given features and
review sentences, opine obtains higher precision than both baseline methods
at a small loss in recall with respect to PMI++ . As described below, this
result is due in large part to opine's ability to handle context-sensitive opinion
words.
We randomly selected 200 (word, feature, sentence) tuples for each word
type (adjective, adverb, etc.) and obtained a test set containing 800 tuples.
Two annotators assigned positive, negative and neutral labels to each tuple
(the inter-annotator agreement was 78%). We retained the tuples on which
the annotators agreed as the gold standard. We ran PMI++ and Hu++ on
the test data and compared the results against opine's results on the same
data.
In order to quantify the benefits of each of the three steps of our method
for finding SO labels, we also compared opine with a version which only finds
SO labels for words and a version which finds SO labels for words in the
context of given features, but doesn't take into account given sentences. We
have learned from this comparison that opine's precision gain over PMI++
Search WWH ::




Custom Search