Geoscience Reference
In-Depth Information
5.5 MANIFOLDREGULARIZATION
Both Mincut and the harmonic function are transductive learning algorithms. They each learn a
function f that is restricted to the labeled and unlabeled vertices in the graph. There is no direct
way to predict the label on a test instance x not seen before, unless one includes x as a new vertex
into the graph and repeats the computation. This is clearly undesirable if we want predictions on a
large number of test instances. What we need is an inductive semi-supervised learning algorithm.
In addition, both Mincut and harmonic function fix f( x )
y for labeled instances. What if
some of the labels are wrong? It is not uncommon for real datasets to have such label noise. We want
f to be able to occasionally disagree with the given labels.
Manifold regularization addresses these two issues. It is an inductive learning algorithm by
defining f in the whole feature space: f
=
: X → R
. f is regularized to be smooth with respect
to the graph by the graph Laplacian as in (5.12). However, this regularizer alone only controls
f , the value of f on the l + u training instances. To prevent f from being too wiggly (and thus
having inferior generalization performance) outside the training samples, it is necessary to include a
second regularization term, such as
= x X
2
f(x) 2 d x . Putting them together, the regularizer
f
for manifold regularization becomes
2
+ λ 2 f L f ,
(f) = λ 1 f
(5.17)
where λ 1 2 0 control the relative strength of the two terms. To allow f to disagree with the given
labels, we can simply use the loss function c( x ,y,f( x ))
f( x )) 2 . This loss function does not
greatly penalize small deviations. Other loss functions are possible, too, for example the hinge loss
that we will introduce in Chapter 6. The complete manifold regularization problem is
=
(y
l
(y i f( x i )) 2
2
+ λ 2 f L f .
+ λ 1 f
min
f : X R
(5.18)
i = 1
The so-called representer theorem guarantees that the optimal f admits a finite ( l
+
u , to be exact)
dimensional representation. There exist efficient algorithms to find the optimal f .
Beyond the unnormalized graph Laplacian matrix L , the normalized graph Laplacian matrix
L
is often used too:
L = D 1 / 2 LD 1 / 2
= I D 1 / 2 WD 1 / 2 .
(5.19)
This results in a slightly different regularization term
w ij f( x i )
2
l + u
1
2
f( x j )
D jj
f L
f
=
D ii
.
(5.20)
i,j = 1
Other variations like L p
p , where p> 0, are possible too. They replace the matrix L in (5.18).
These all encode the same overall label-smoothness assumption on the graph, but with varying
subtleties. We discuss several properties of L below. Please see the references at the end of the
chapter to learn more about the other variations.
L
or
Search WWH ::




Custom Search