Geoscience Reference
In-Depth Information
40 CHAPTER 4. CO-TRAINING
The success of regularized risk minimization depends on the regularizer (f) . Different
regularizers imply different assumptions of the task. For example, a commonly used regularizer
for f( x )
1
2 . This particular regularizer penalizes the squared norm of the
parameters w . It is helpful to view f as a point whose coordinates are determined by w in the
parameter space. An equivalent form for the optimization problem in (4.3) is
w x is (f)
=
=
2
w
R(f )
min
f F
subject to
(f)
s,
where s is determined by λ . It becomes clear that the regularizer constrains the radius of the ball in
the parameter space. Within the ball, the f that best fits the training data is chosen. This controls
the complexity of f , and prevents overfitting.
Importantly, for semi-supervised learning, one can often define the regularizer (f) using
the unlabeled data. For example,
(f) = SL (f ) + λ SSL (f ),
(4.4)
where SL (f ) is a supervised regularizer, and SSL (f ) is a semi-supervised regularizer that depends
on the unlabeled data. When SSL (f ) indeed fits the task, such regularization can produce a better
f than that produced by SL (f ) alone. We will next show how to define SSL (f ) to encourage
agreement among multiple hypotheses, and discuss other forms of SSL (f ) in later chapters.
We are now ready to introduce multiview learning . We assume the algorithm has access to
k separate learners. It is possible, but not necessary, for each learner to use a subset of the features
of an instance x . This is the generalization of Co-Training to k views, hence the name multiview.
Alternatively, the learners might be of different types (e.g., decision tree, neural network, etc.) but
take the same features of x as input. This is similar to the so-called ensemble method. In either
case, the goal is for the k learners to produce hypotheses f 1 ,...,f k
to minimize the following
regularized risk:
l
c( x i ,y i ,f v ( x i )) + λ 1 SL (f v )
k
(f 1 ,...,f k )
= argmin f 1 ,...,f k
v = 1
i = 1
k
l + u
+ λ 2
c( x i ,f u ( x i ), f v ( x i )).
(4.5)
u,v = 1
i = l + 1
The intuition is for each hypothesis to not only minimize its own empirical risk, but also agree
with all the other hypotheses. The first part of the multiview regularized risk is simply the sum
of individual (supervised) regularized risks. The second part defines a semi-supervised regularizer,
which measures the disagreement of those k hypotheses on unlabeled instances:
k
l + u
SSL (f 1 ,...,f k )
=
c( x i ,f u ( x i ), f v ( x i )).
(4.6)
u,v
=
1
i
=
l
+
1
Search WWH ::




Custom Search