Cross-Domain Opinion Word Identification with Query-By-Committee Active Learning - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

2 Related Work

2.1 Cross-Domain Opinion Mining

Several studies have tackled the problem of cross-domain opinion mining. Aue

and Gamon [1] compared four strategies for utilizing opinion-labeled data from

one or more non-target domains and concluded that using non-targeted labeled

data without an adaptation strategy is less effective than using unlabeled data

from the target domain. Jakob and Gurevych [4] proposed a CRF-based ap-

proach to opinion target extraction in single and multiple domains. They used

reviews from web-service, movie, automobile, and camera domains. They found

that when the token string feature was removed, cross-domain extraction per-

formance in terms of F-measure would approach the results of single-domain

extraction.

Bollegala et al. [2] addressed the sentiment classification problem in different

domains. They built a sentiment-sensitive thesaurus, using both labeled and

unlabeled data from multiple source domains, and used it to find associations

between words that express similar sentiments in different domains.

2.2 Opinion Mining with Active Learning

Active learning is used to reduce manual labeling of target domain data and

enhance performance at the same time. Li et al. [5] proposed an active-learning-

based selection strategy for cross-domain sentiment classification. They trained

two classifiers, one on labeled source data and the other on labeled target data,

and employed them to select informative samples. The two classifiers were then

combined to make the final decision. We extend this approach and adapt it for

OWI.

3 Our Approach

In our approach, we formulate OWI as a sequence labeling task and use condi-

tional random fields (CRFs) to model this task. We use the CRF++ package.

Because Chinese words are not separated by spaces, we use the CKIP word

segmentation tool 1 to segment all review sentences into individual words and

tag each word's part of speech (POS).

3.1 Features

Contextual Part-of-Speech Features. Opinion words are generally adjec-

tives, however, since not all adjectives are opinion words, we must consider the

context of the target word (current token). Contextual part-of-speech features

describe the POS's of the words surrounding the current token. The POS's of

the words surrounding the target token are referred to as follows: pos i is the

POS of the word at position i relative to the target token pos 0 . Our system uses

a range of pos − 3 to pos 3 .

1 http://ckipsvr.iis.sinica.edu.tw

Search WWH ::

Custom Search

Home