Graph-Based Semi-Supervised Learning - Introduction to Semi-Supervised Learning

Geoscience Reference

In-Depth Information

CHAPTER

5

Graph-Based Semi-Supervised

Learning

5.1 UNLABELEDDATA AS STEPPING STONES

Alice was flipping through the magazine “Sky and Earth,” in which each article is either about

astronomy or travel. Speaking no English, she had to guess the topic of each article from its pic-

tures. The first story “Bright Asteroid” had a picture of a cratered asteroid—it was obviously about

astronomy. The second story “Yellowstone Camping” had a picture of grizzly bears—she figured it

must be a travel article.

But no other articles had pictures. “What is the use of a magazine without pictures?” thought

Alice. The third article was titled “Zodiac Light,” while the fourth “Airport Bike Rental.” Not

knowing any words and without pictures, it seemed impossible to guess the topic of these articles.

However, Alice is a resourceful person. She noticed the titles of other articles include “Asteroid

and Comet,” “Comet Light Curve,” “Camping in Denali,” and “Denali Airport.” “I'll assume that

if two titles share a word, they are about the same topic,” she thought. And she started to doodle:

Alice's doodle. Articles sharing title words are connected.

Then it became clear. “Aha! 'Zodiac Light' is about astronomy, and 'Airport Bike Rental' is about

travel!” exclaimed Alice. And she was correct. Alice just performed graph-based semi-supervised

learning without knowing it.

5.2 THEGRAPH

Graph-based semi-supervised learning starts by constructing a graph from the training data. Given

training data

l

l + u

l

{

( x i ,y i )

}

i = 1 ,

{

x j }

j = l + 1 , the vertices are the labeled and unlabeled instances

{

( x i )

}

i = 1 ∪

l + u

{

j = l + 1 . Clearly, this is a large graph if u , the unlabeled data size, is big. Note that once the graph

is built, learning will involve assigning y values to the vertices in the graph. This is made possible

x j }

Search WWH ::

Custom Search

Home