Information Technology Reference
In-Depth Information
predict test data ġ
y ġ
criteria ġ
Result ġ
predict U T ġ
training ġ
L S ġ
M S /M C ġ
n ġ
Result ġ
Selection ġ
Finish ġ
U T ġ
L T ġ
M T ġ
Result ġ
training ġ
predict U T ġ
Human annotation ġ
¨/ T ġ
Fig. 1. Flowchart of our QBC-based active learning
Another issue in active learning is determining suitable stop criteria. Here
we assume that the F-measure of the current combined model ( M C )is S 2 ,and
the previous combined model's F-measure is S 1 ,when S 2
S 1 is lower than a
threshold t , the active learning process stops.
4 Experiments
4.1 Datasets
We compile three annotated datasets: 10,000 restaurant review sentences
from (denoted as D R ), 10,000 movie review sentences from (denoted as D M ), and 10,000 hotel review sentences from (denoted as D H ). All review sentences are written in Chinese and
annotated by two experts.
We conduct domain adaptation experiments on all C 2 domain pairs. In each
experiment, a dataset is chosen as the dataset of the source domain (denoted
as D S ), and the other dataset is the dataset of the target domain (denoted as
D T ). We use all of the 10,000 sentences from D S for training and randomly
select 3,000 sentences from D T 30 times for testing. The remaining 7,000 D T
sentences are treated as the selection pool for active learning.
4.2 Evaluation Metrics
The results are given as F-measures and defined as 2 PR/ ( P + R ), where P de-
notes the precision of opinion word mentions and R denotes the recall of opinion
word mentions. We sum the scores for all 30 tests, and calculate the averages for
performance comparison. The results are reported as the mean precision ( P ),
recall ( R ), and F-measure ( F ) of thirty datasets.
Search WWH ::

Custom Search