Cross-Domain Opinion Word Identification with Query-By-Committee Active Learning - Technologies and Applications of Artificial Intelligence - page 338

Information Technology Reference

In-Depth Information

4.3 Experiment 1

In this experiment, we fix the number of annotated sentences of the target do-

main to 1,000. This amount of annotation requires one man-hour, which is a

relatively small amount of human effort for domain adaptation.

Table 2 shows the performance of adapting our system from the restaurant

domain to the movie domain. The baseline strategy randomly selects 1,000 sen-

tences from 10,000 target-domain sentences. The baseline result, which comes

from random selection, is shown in row 1. The performance of the system trained

on all 10,000 target domain sentences is treated as the upper bound, while the

system trained on the source domain data only is regarded as the lower bound.

Figure 2 shows the cross-domain performance of our system in all the six source-

target domain pairs. We can see that the F-measure grows with the number of

human-annotated sentences ( A ). The vertical axis corresponds to the F-measure

and the horizontal axis corresponds to the number of manually annotated sen-

tences in the target domain.

Tabl e 2. Performance in different selection settings (Restaurant → Movie)

P R F A

Random (baseline) 0.836 0.727 0.777 1,000

Proposed approach 0.852 0.784 0.816 1,000

Trainedonall D T sentences (Upper bound) 0.877 0.841 0.858 10,000

Trainedonall D S sentences (Lower bound) 0.723 0.554 0.627

0

In the beginning, performance improves greatly as more annotated data is

added. However, after around 200 annotated sentences have been added, perfor-

mance increase tapers off, though it still increases more than random selection.

Unlike other domain pairs, the F-measures of our QBC-based system in movie ₒ

restaurant (Figure 2a) and movie

hotel (Figure 2f) increase sharply after 100

annotated sentences are added to the training set. To investigate the reason, we

further analyze the overlapping ratio of every target-source domain pair. Table

3 shows the ratio of D T tokens appearing in D S . We can see that the domain

pairs with the lowest two ratios are (movie, restaurant) and (movie, hotel). This

explains why the CRF-based system achieves low F-measures when using only

D S , but after a few D T sentences are added, its F-measures increase rapidly.

ₒ

4.4 Experiment 2

In this experiment, we compare performance of configurations using source do-

main plus target-domain data to that of configurations using only target-domain

data and only source-domain data. S denotes the system trained on the 10,000

D S sentences, which is considered to be the lower bound. T denotes the system

trained on the newly annotated D T sentences only. The S+T configuration is

Next Page

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home