Graph-Based Semi-Supervised Learning - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

where a i ,i

u are real-valued coefficients. It is possible to show that the graph regular-

izer (5.12) can be written as

1 ...l

l + u

f L f

a i λ i .

(5.25)

i = 1

If for each i , either a i or λ i is close to zero, then f L f will be small. Intuitively, this means the

graph regularizer f L f prefers an f that only uses smooth basis (those with a small λ i ). By “uses” we

mean f 's corresponding coefficients have large magnitude

. In particular, f L f is minimized and

equals zero, if f is in the subspace spanned by φ 1 ,...,φ k for a graph with k connected components:

| a i |

a i φ i ,a i =

0 for i>k .

(5.26)

i =

( 1 / √ l

u,..., 1 / √ l

For a connected graph, only λ 1 =

u) . Any constant vector f

thus has coefficients a 1 = 0, a i = 0 for i> 1, and is a minimizer of f L f . Being a constant, it is

certainly the most smooth function on the graph.

Therefore, we see the connection between graph-based semi-supervised learning methods

and the graph spectrum. This exposes a major weakness of this family of methods: the performance

is sensitive to the graph structure and edge weights.

0, and φ 1 =

Example 5.4. Harmonic Function with a Bad Graph To demonstrate the previous point, Fig-

ure 5.5 presents a dataset comprised of two semi-circles (one per class) that intersect. This creates

a problem for the types of graphs traditionally used in graph-based semi-supervised learning. Fig-

ure 5.5(a) shows the symmetric 4-NN graph for this data. An NN graph is similar. Note that, near

the intersection of the two curves, many edges connect instances on one curve to instances on the

other. This means that any algorithm that assumes label smoothness with respect to the graph is

likely to make many mistakes in this region. Labels will propagate between the classes, which is

clearly undesirable.

We applied the harmonic function with only two labeled instances (large X and O) using the

graph shown and weights as in (5.1). Figure 5.5(b) shows the predicted labels (small x's and o's) for

the unlabeled instances. We observe that the predictions are very poor, as the O label propagates

through most of the other class's instances due to the between-class connections in the graph. Even a

simple linear classifier using only the labeled instances would be able to correctly predict the left half

of the X curve (i.e., the decision boundary would be a diagonal line between the labeled instances).

While graph-based semi-supervised learning can be a powerful method to incorporate unlabeled

data, one must be careful to ensure that the graph encodes the correct label smoothness assumption.

To handle this dataset properly and obtain all correct predictions, the graph would need to split

the data into two disconnected components. One approach to building such a graph is to examine

the local neighborhood around each instance and only connect instances whose neighborhoods have

similar shapes: neighborhoods along the same curve would look similar with only a minor rotation,

Introduction to Semi-Supervised Learning

Search WWH ::

Custom Search

Home