Information Technology Reference
In-Depth Information
2 Related Work
2.1 Cross-Domain Opinion Mining
Several studies have tackled the problem of cross-domain opinion mining. Aue
and Gamon [1] compared four strategies for utilizing opinion-labeled data from
one or more non-target domains and concluded that using non-targeted labeled
data without an adaptation strategy is less effective than using unlabeled data
from the target domain. Jakob and Gurevych [4] proposed a CRF-based ap-
proach to opinion target extraction in single and multiple domains. They used
reviews from web-service, movie, automobile, and camera domains. They found
that when the token string feature was removed, cross-domain extraction per-
formance in terms of F-measure would approach the results of single-domain
extraction.
Bollegala et al. [2] addressed the sentiment classification problem in different
domains. They built a sentiment-sensitive thesaurus, using both labeled and
unlabeled data from multiple source domains, and used it to find associations
between words that express similar sentiments in different domains.
2.2 Opinion Mining with Active Learning
Active learning is used to reduce manual labeling of target domain data and
enhance performance at the same time. Li et al. [5] proposed an active-learning-
based selection strategy for cross-domain sentiment classification. They trained
two classifiers, one on labeled source data and the other on labeled target data,
and employed them to select informative samples. The two classifiers were then
combined to make the final decision. We extend this approach and adapt it for
OWI.
3 Our Approach
In our approach, we formulate OWI as a sequence labeling task and use condi-
tional random fields (CRFs) to model this task. We use the CRF++ package.
Because Chinese words are not separated by spaces, we use the CKIP word
segmentation tool 1 to segment all review sentences into individual words and
tag each word's part of speech (POS).
3.1 Features
Contextual Part-of-Speech Features. Opinion words are generally adjec-
tives, however, since not all adjectives are opinion words, we must consider the
context of the target word (current token). Contextual part-of-speech features
describe the POS's of the words surrounding the current token. The POS's of
the words surrounding the target token are referred to as follows: pos i is the
POS of the word at position i relative to the target token pos 0 . Our system uses
a range of pos 3 to pos 3 .
1 http://ckipsvr.iis.sinica.edu.tw
 
Search WWH ::




Custom Search