Geoscience Reference
In-Depth Information
36 CHAPTER 4. CO-TRAINING
Because these latter instances are not covered by the two labeled instances in our training
sample, a supervised learner will not be able to classify them correctly. It seems that a very large
labeled
training sample is necessary to cover all the variations in location or person expressions. Or
is it?
4.2 CO-TRAINING
It turns out that one does not need a large labeled training sample for this task. It is sufficient to have
a large
unlabeled
training sample, which is much easier to obtain. Let us say we have the following
unlabeled instances:
instance 3:
...
headquartered in
(Kazakhstan)
...
instance 4:
...
flew to
(Kazakhstan)
...
instance 5:
...
(Mr. Smith), a
partner at
Steptoe & Johnson
...
It is illustrative to inspect the features of the labeled and unlabeled instances together:
x
(
1
)
x
(
2
)
instance
y
1.
Washington State
headquartered in
Location
2.
Mr. Washington
vice president
Person
3.
Kazakhstan
headquartered in
?
4.
Kazakhstan
flew to
?
5.
Mr. Smith
partner at
?
One may reason about the data in the following steps:
1. From labeled instance 1, we learn that “headquartered in” is a context that seems to indicate
y
=
Location
.
2. If this is true, we infer that “Kazakhstan” must be a
Location
since it appears with the same
context “headquartered in” in instance 3.
3. Since instance 4 is also about “Kazakhstan,” it follows that its context “flew to” should indicate
Location
.
4. At this point, we are able to classify “China” in
“flew to
(China)” as a
Location
, even though
neither “flew to” nor “China” appeared in the labeled data!
5. Similarly, by matching “Mr. *” in instances 2 and 5, we learn that “partner at” is a context for
y
=
Person
. This allows us to classify “(Robert Jordan), a
partner at
”as
Person
, too.
This process bears a strong resemblance to the self-training algorithm in Section 2.5, where a
classifier uses its most confident predictions on unlabeled instances to teach itself. There is a critical
difference, though: we implicitly used
two
classifiers in turn. They operate on different views of an
instance: one is based on the named entity string itself (
x
(
1
)
), and the other is based on the context