Geoscience Reference
In-Depth Information
The pairwise disagreement is defined as the loss on an unlabeled instance x i when pretending f u (x i )
is the label and f v (x i ) is the prediction. Such disagreement is to be minimized. The final prediction
for input x is the label least objected to by all the hypotheses:
k
c( x ,y,f v ( x )).
y( x ) = argmin
y Y
(4.7)
v = 1
Different c and SL lead to different instantiations of multiview learning. We give a concrete example
below.
Example 4.6. Two-View Linear Ridge Regression Let each instance have two views x
=
x ( 1 ) , x ( 2 )
. Consider two linear regression functions f ( 1 ) ( x ) =
w x ( 1 ) , f ( 2 ) ( x ) =
v x ( 2 ) . Let the
[
]
f( x )) 2 . Let the supervised regularizers be SL (f ( 1 ) )
2 ,
loss function be c( x ,y,f( x ))
=
(y
=
w
SL (f ( 2 ) ) =
2 . This particular form of regularization, i.e., penalizing the 2 norm of the pa-
rameter, is known as ridge regression. The regularized risk minimization problem is
v
l
l
w x ( 1 )
v x ( 2 )
) 2
) 2
2
2
min
w,v
(y i
+
(y i
+ λ 1
w
+ λ 1
v
i
i
i = 1
i = 1
l + u
( w x ( 1 )
v x ( 2 )
) 2 .
+ λ 2
(4.8)
i
i
i = l + 1
The solution can be found by setting the gradient to zero and solving a system of linear equations.
What is the assumption behind multiview learning? In a regularized risk framework, the
assumption is encoded in the regularizer SSL (4.6) to be minimized. That is, multiple hypotheses
f 1 ,...,f k should agree with each other. However, agreement alone is not sufficient. Consider the
following counter-example: Replicate the feature k times to create k identical “views.” Also replicate
the hypotheses f 1 = ... = f k . By definition they all agree, but this does not guarantee that they are
any better than single-view learning (in fact the two are the same). The key insight is that the set of
agreeing hypotheses need to additionally be a small subset of the hypothesis space
. In contrast,
the duplicating hypotheses in the counter-example still occupy the whole hypothesis space
F
F
.
Remark 4.7. Multiview Learning Assumption Multiview learning is effective, when a set of
hypotheses f 1 ,...,f k agree with each other. Furthermore, there are not many such agreeing sets,
and the agreeing set happens to have a small empirical risk.
This concludes our discussion of co-training and multiview learning techniques. These models
use multiple views or classifiers, in conjunction with unlabeled data, in order to reduce the size of
the hypothesis space. We also introduced the regularized risk minimization framework for machine
learning, which will appear again in the next two chapters on graph-based methods and semi-
supervised support vector machines.
Search WWH ::




Custom Search