Co-Training - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

36 CHAPTER 4. CO-TRAINING

Because these latter instances are not covered by the two labeled instances in our training

sample, a supervised learner will not be able to classify them correctly. It seems that a very large

labeled training sample is necessary to cover all the variations in location or person expressions. Or

is it?

4.2 CO-TRAINING

It turns out that one does not need a large labeled training sample for this task. It is sufficient to have

a large unlabeled training sample, which is much easier to obtain. Let us say we have the following

unlabeled instances:

instance 3:

... headquartered in (Kazakhstan) ...

instance 4:

... flew to (Kazakhstan) ...

instance 5:

... (Mr. Smith), a partner at Steptoe & Johnson ...

It is illustrative to inspect the features of the labeled and unlabeled instances together:

x ( 1 )

x ( 2 )

instance

y

1.

Washington State

headquartered in Location

2.

Mr. Washington

vice president

Person

3.

Kazakhstan

headquartered in ?

4.

Kazakhstan

flew to

?

5.

Mr. Smith

partner at

?

One may reason about the data in the following steps:

1. From labeled instance 1, we learn that “headquartered in” is a context that seems to indicate

y = Location .

2. If this is true, we infer that “Kazakhstan” must be a Location since it appears with the same

context “headquartered in” in instance 3.

3. Since instance 4 is also about “Kazakhstan,” it follows that its context “flew to” should indicate

Location .

4. At this point, we are able to classify “China” in “flew to (China)” as a Location , even though

neither “flew to” nor “China” appeared in the labeled data!

5. Similarly, by matching “Mr. *” in instances 2 and 5, we learn that “partner at” is a context for

y = Person . This allows us to classify “(Robert Jordan), a partner at ”as Person , too.

This process bears a strong resemblance to the self-training algorithm in Section 2.5, where a

classifier uses its most confident predictions on unlabeled instances to teach itself. There is a critical

difference, though: we implicitly used two classifiers in turn. They operate on different views of an

instance: one is based on the named entity string itself ( x ( 1 ) ), and the other is based on the context

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home