Geoscience Reference
In-Depth Information
where a i ,i
u are real-valued coefficients. It is possible to show that the graph regular-
izer (5.12) can be written as
=
1 ...l
+
l + u
f L f
a i λ i .
=
(5.25)
i = 1
If for each i , either a i or λ i is close to zero, then f L f will be small. Intuitively, this means the
graph regularizer f L f prefers an f that only uses smooth basis (those with a small λ i ). By “uses” we
mean f 's corresponding coefficients have large magnitude
. In particular, f L f is minimized and
equals zero, if f is in the subspace spanned by φ 1 ,...,φ k for a graph with k connected components:
| a i |
k
f
=
a i φ i ,a i =
0 for i>k .
(5.26)
i =
1
( 1 / l
u,..., 1 / l
For a connected graph, only λ 1 =
u) . Any constant vector f
thus has coefficients a 1 = 0, a i = 0 for i> 1, and is a minimizer of f L f . Being a constant, it is
certainly the most smooth function on the graph.
Therefore, we see the connection between graph-based semi-supervised learning methods
and the graph spectrum. This exposes a major weakness of this family of methods: the performance
is sensitive to the graph structure and edge weights.
0, and φ 1 =
+
+
Example 5.4. Harmonic Function with a Bad Graph To demonstrate the previous point, Fig-
ure 5.5 presents a dataset comprised of two semi-circles (one per class) that intersect. This creates
a problem for the types of graphs traditionally used in graph-based semi-supervised learning. Fig-
ure 5.5(a) shows the symmetric 4-NN graph for this data. An NN graph is similar. Note that, near
the intersection of the two curves, many edges connect instances on one curve to instances on the
other. This means that any algorithm that assumes label smoothness with respect to the graph is
likely to make many mistakes in this region. Labels will propagate between the classes, which is
clearly undesirable.
We applied the harmonic function with only two labeled instances (large X and O) using the
graph shown and weights as in (5.1). Figure 5.5(b) shows the predicted labels (small x's and o's) for
the unlabeled instances. We observe that the predictions are very poor, as the O label propagates
through most of the other class's instances due to the between-class connections in the graph. Even a
simple linear classifier using only the labeled instances would be able to correctly predict the left half
of the X curve (i.e., the decision boundary would be a diagonal line between the labeled instances).
While graph-based semi-supervised learning can be a powerful method to incorporate unlabeled
data, one must be careful to ensure that the graph encodes the correct label smoothness assumption.
To handle this dataset properly and obtain all correct predictions, the graph would need to split
the data into two disconnected components. One approach to building such a graph is to examine
the local neighborhood around each instance and only connect instances whose neighborhoods have
similar shapes: neighborhoods along the same curve would look similar with only a minor rotation,
 
Search WWH ::




Custom Search