Information Technology Reference
In-Depth Information
Tabl e 4. Cross-domain opinion word identification performance
Setting A
P
R
F
Setting A
P
R
F
S
0 0.723 0.554 0.627
S
0 0.849 0.739 0.790
restaurant movie
T
833 0.832 0.700 0.760
restaurant hotel
T
503 0.919 0.748 0.824
S+T
833 0.846 0.771 0.806
S+T
503 0.909 0.842 0.874
S
0 0.930 0.239 0.380
S
0 0.952 0.352 0.514
movie
restaurant
T
780 0.909 0.837 0.871
movie
hotel
T
643 0.920 0.763 0.834
S+T
780 0.922 0.865 0.893
S+T
643 0.918 0.829 0.871
S
0 0.923 0.675 0.780
S
0 0.755 0.639 0.692
hotel
restaurant
T
653 0.907 0.827 0.866
hotel
movie
T
873 0.833 0.704 0.763
S+T
653 0.922 0.869 0.895
S+T
873 0.848 0.777 0.811
As shown in Table 4, on average, our QBC-based system outperforms the
system trained on the source-domain data by 0.228 in F-measure. This significant
improvement is achieved by adding only 714 newly annotated target-domain
sentences. These results demonstrate that our QBC-based system is much more
practical and e cient than annotating the new domain data from scratch if
annotated data from other domains are available.
5Con lu on
In this work, we propose a cross-domain OWI system. Using the query by com-
mittee (QBC) active learning scheme, we select controlled amounts of data from
the new domain for manual annotation to complement the annotated data of
the pre-existing domain. We compile three annotated datasets corresponding to
three different domains. Every dataset contains 10,000 sentences. We conduct
domain adaptation experiments on all six domain pairs. Our experiments show
that after only 1,000 annotated sentences from the new domain are added to
the pre-existing annotated data, our system can achieve approximate accuracy
as the system trained on 10,000 annotated sentences from the new domain. Our
system with the QBC active learning scheme also outperforms the same system
with random selection.
References
1. Aue, A., Gamon, M.: Customizing Sentiment Classifiers to New Domains: A
Case Study. In: Proceedings of Recent Advances in Natural Language Processing
(RANLP) (2005)
2. Bollegala, D., Weir, D., Carroll, J.: Using multiple sources to construct a senti-
ment sensitive thesaurus for cross-domain sentiment classification. In: Proceedings
of the 49th Annual Meeting of the Association for Computational Linguistics:
Human Language Technologies, HLT 2011, vol. 1, pp. 132-141. Association for
Computational Linguistics, Stroudsburg (2011)
3. Cambria, E., Speer, R., Havasi, C., Hussain, A.: Senticnet: A publicly available
semantic resource for opinion mining. In: AAAI Fall Symposium: Commonsense
Knowledge, volume FS-10-02 of AAAI Technical Report. AAAI (2010)
 
Search WWH ::




Custom Search