Extracting Product Features and Opinions from Reviews - Natural Language Processing and Text Mining

Information Technology Reference

In-Depth Information

between the phrase and meronymy discriminators associated with the prod-

uct class (see Table 2.2). opine distinguishes parts from properties using

WordNet's IS-A hierarchy (which enumerates different kinds of properties)

and morphological cues ( e.g. , “-iness”, “-ity” su xes).

Given a target product class C , opine finds concepts related to C by

extracting frequent noun phrases as well as noun phrases linked to C or C 's

instances through verbs or prepositions ( e.g. ,“The scanner produces great

images ”). Related concepts are assessed as described in [6] and then stored as

product features together with their parts and properties.

2.3.3 Experiments: Explicit Feature Extraction

The previous review mining systems most relevant to our work are those in

[2] and [7]. We only had access to the data used in [2] and therefore our

experiments include a comparison between opine and Hu and Liu's system,

but no direct comparison between opine and IBM's SentimentAnalyzer [7]

(see the related work section for a discussion of this work).

Hu and Liu's system uses association rule mining to extract frequent re-

view noun phrases as features. Frequent features are used to find potential

opinion words (only adjectives) and the system uses WordNet synonyms and

antonyms in conjunction with a set of seed words in order to find actual opin-

ion words. Finally, opinion words are used to extract associated infrequent

features. The system only extracts explicit features.

On the five datasets used in [2], opine's precision is 22% higher than Hu's

at the cost of a 3% recall drop. There are two important differences between

opine and Hu's system: a) opine's Feature Assessor uses PMI assessment to

evaluate each candidate feature and b) opine incorporates Web PMI statistics

in addition to review data in its assessment. In the following, we quantify the

performance gains from a) and b).

a) In order to quantify the benefits of opine's Feature Assessor, we use

it to evaluate the features extracted by Hu's algorithm on review data. The

Feature Assessor improves Hu's precision by 6%.

b) In order to evaluate the impact of using Web PMI statistics, we assess

opine's features first on reviews, and then on reviews in conjunction with the

Web. Web PMI statistics increase precision by an average of 14.5%.

Overall, 1/3 of OPINE 's precision increase over Hu's system comes from

using PMI assessment on reviews and the other 2/3 from the use of the Web

PMI statistics.

In order to show that opine's performance is robust across multiple

product classes, we used two sets of 1,307 reviews downloaded from

tripadvisor.com for Hotels and amazon.com for Scanners. Two annotators

labeled a set of unique 450 opine extractions as correct or incorrect .The

inter-annotator agreement was 86%. The extractions on which the annotators

agreed were used to compute opine's precision, which was 89%. Furthermore,

the annotators extracted explicit features from 800 review sentences (400 for

Natural Language Processing and Text Mining

Search WWH ::

Custom Search

Home