Information Technology Reference
In-Depth Information
Cross-Domain Opinion Word Identification
with Query-By-Committee Active Learning
Yi-Lin Tsai 1 , Richard Tzong-Han Tsai 2 , ,
Chuang-Hua Chueh 3 , and Sen-Chia Chang 3
1 Department of ISA, National Tsinghua University, Hsinchu, Taiwan
s102065514@m102.nthu.edu.tw
2 Department of CSIE, National Central University, Chungli, Taiwan
thtsai@csie.ncu.edu.tw
3 Industrial Technology Research Institute, Hsinchu, Taiwan
{chchueh,chang}@itri.org.tw
Abstract. Opinion word identification (OWI). is an important task for
opinion mining. In OWI, it is necessary to find the exact positions of
opinion word mentions. Supervised learning approaches can locate such
mentions with high accuracy. To construct an OWI system for a new do-
main, it is necessary to annotate su cient amounts of data to represent
the new domain's characteristics. However, since annotating every new
domain extensively is costly, how to best utilize existing annotated data
is a very important challenge for mention-based OWI systems. In this
work, we propose a cross-domain OWI system. The query by committee
(QBC) active learning scheme is used to select controlled amounts of data
in the new domain for manual annotation. This new annotated data is
used to complement the existing annotated data of the original domain.
We compile three annotated datasets, each for one of three different do-
mains, and conduct domain adaptation experiments on all six domain
pairs. Our experiments show that by adding only 1,000 newly annotated
sentences from the new domain to the existing annotated data, our sys-
tem can achieve nearly the same level of accuracy as a system trained
on 10,000 annotated new-domain sentences. Our system with the QBC
active learning scheme also outperforms the same system with a random
selection scheme.
Keywords: opinion word identification, active learning, cross-domain.
1 Introduction
With the explosion of social media, blogs, and review sites, more and more cus-
tomer opinions are available online. These are beneficial to both sellers interested
in evaluating consumers' needs and shoppers looking for new products/services.
Opinion word identification (OWI) is a fundamental task in opinion mining.
According to output, OWI approaches can be categorized into two main types:
list-based and mention-based.
Corresponding author.
Search WWH ::




Custom Search