Information Technology Reference
In-Depth Information
List-based systems output a list of opinion words. Such systems are usually
either propagation-based or co-occurrence-based. Propagation-based approaches
have two main steps: sentiment seed collection and sentiment value propagation.
In the first step, seeds with accurate sentiment values are collected. Usually,
these seeds are manually annotated or collected from existing dictionaries. In
the second step, an existing word/phrase/concept graph is used as the founda-
tion. Sentiment values are propagated from seeds to the remaining parts of the
foundation graph [3, 9]. Co-occurrence-based approaches employ co-occurrence
statistics to estimate if an opinion word candidate corresponds to a given opinion
target and vice versa [10, 6]. Both list-based approaches can construct opinion
word dictionaries without human annotation.
List-based OWI, however, does not tell us much about the context in which
opinion words are used
it simply outputs a list of all the opinion words in a body
of text. To better understand opinion words in context, it is necessary to find
the exact sentence positions where the words are mentioned. One common way
of identifying the positions of opinion words in the output list is to match them
back against the text. All matched occurrences in the text are then regarded
as opinion mentions. The problem with this approach is that not all matched
positions are actual opinion mentions. For example, the word “
/delicious”
would not necessarily represent an opinion in a review of a restaurant named
美味
/Delicious Restaurant”.
The mention-based approach is designed to identify and locate all opinion
mentions in reviews. Mention-based OWI is usually formulated as a sequence
labelling task in which tokens are either labelled as “opinion-word mention” or
“other” [11]. The approach can achieve high accuracy, but because it requires
large amounts of annotated data, construction of a mention-based OWI system
for a new domain can be costly in terms of human effort. One way to reduce
this cost is to adapt an existing system for use in a new domain. However,
cross-domain OWI poses its own problems, as the original domain data may
not be compatible with the new domain. Finding the optimal way to selectively
annotate sucient data from the new domain is a critical challenge in cross-
domain OWI.
Active learning is a method employed in many NLP tasks to select new data.
For example, it has performed well in named-entity recognition [8] and sentiment
classification [5]. The objective of active learning is to use the least amount
of annotated data to achieve the highest performance. Query by Committee
(QBC) [7] is one of the most ecient active learning algorithms. The QBC
approach asks every model (committee member) to vote on every query's (data
instance's) label. Only the most uncertain instances (the most diversely labeled)
are selected for manual annotation. In this study, we propose a new cross-domain
opinion word extraction approach with QBC-based active learning. We adapt our
system from one of three source domains to one of three target domains. Our
system is tested on six source-target domain pairs in total. We review the related
research in Section 2 and illustrate our approach in Section 3. In Section 4, we
report our evaluation results. Our concluding remarks are given in Section 5.
美味餐廳
Search WWH ::




Custom Search