Information Technology Reference
In-Depth Information
4.3 Experiment 1
In this experiment, we fix the number of annotated sentences of the target do-
main to 1,000. This amount of annotation requires one man-hour, which is a
relatively small amount of human effort for domain adaptation.
Table 2 shows the performance of adapting our system from the restaurant
domain to the movie domain. The baseline strategy randomly selects 1,000 sen-
tences from 10,000 target-domain sentences. The baseline result, which comes
from random selection, is shown in row 1. The performance of the system trained
on all 10,000 target domain sentences is treated as the upper bound, while the
system trained on the source domain data only is regarded as the lower bound.
Figure 2 shows the cross-domain performance of our system in all the six source-
target domain pairs. We can see that the F-measure grows with the number of
human-annotated sentences ( A ). The vertical axis corresponds to the F-measure
and the horizontal axis corresponds to the number of manually annotated sen-
tences in the target domain.
Tabl e 2. Performance in different selection settings (Restaurant Movie)
P R F A
Random (baseline) 0.836 0.727 0.777 1,000
Proposed approach 0.852 0.784 0.816 1,000
Trainedonall D T sentences (Upper bound) 0.877 0.841 0.858 10,000
Trainedonall D S sentences (Lower bound) 0.723 0.554 0.627
0
In the beginning, performance improves greatly as more annotated data is
added. However, after around 200 annotated sentences have been added, perfor-
mance increase tapers off, though it still increases more than random selection.
Unlike other domain pairs, the F-measures of our QBC-based system in movie
restaurant (Figure 2a) and movie
hotel (Figure 2f) increase sharply after 100
annotated sentences are added to the training set. To investigate the reason, we
further analyze the overlapping ratio of every target-source domain pair. Table
3 shows the ratio of D T tokens appearing in D S . We can see that the domain
pairs with the lowest two ratios are (movie, restaurant) and (movie, hotel). This
explains why the CRF-based system achieves low F-measures when using only
D S , but after a few D T sentences are added, its F-measures increase rapidly.
4.4 Experiment 2
In this experiment, we compare performance of configurations using source do-
main plus target-domain data to that of configurations using only target-domain
data and only source-domain data. S denotes the system trained on the 10,000
D S sentences, which is considered to be the lower bound. T denotes the system
trained on the newly annotated D T sentences only. The S+T configuration is
 
Search WWH ::




Custom Search