Geoscience Reference
In-Depth Information
CHAPTER
5
Graph-Based Semi-Supervised
Learning
5.1 UNLABELEDDATA AS STEPPING STONES
Alice was flipping through the magazine “Sky and Earth,” in which each article is either about
astronomy or travel. Speaking no English, she had to guess the topic of each article from its pic-
tures. The first story “Bright Asteroid” had a picture of a cratered asteroid—it was obviously about
astronomy. The second story “Yellowstone Camping” had a picture of grizzly bears—she figured it
must be a travel article.
But no other articles had pictures. “What is the use of a magazine without pictures?” thought
Alice. The third article was titled “Zodiac Light,” while the fourth “Airport Bike Rental.” Not
knowing any words and without pictures, it seemed impossible to guess the topic of these articles.
However, Alice is a resourceful person. She noticed the titles of other articles include “Asteroid
and Comet,” “Comet Light Curve,” “Camping in Denali,” and “Denali Airport.” “I'll assume that
if two titles share a word, they are about the same topic,” she thought. And she started to doodle:
Alice's doodle. Articles sharing title words are connected.
Then it became clear. “Aha! 'Zodiac Light' is about astronomy, and 'Airport Bike Rental' is about
travel!” exclaimed Alice. And she was correct. Alice just performed graph-based semi-supervised
learning without knowing it.
5.2 THEGRAPH
Graph-based semi-supervised learning starts by constructing a graph from the training data. Given
training data
l
l + u
l
{
( x i ,y i )
}
i = 1 ,
{
x j }
j = l + 1 , the vertices are the labeled and unlabeled instances
{
( x i )
}
i = 1
l + u
{
j = l + 1 . Clearly, this is a large graph if u , the unlabeled data size, is big. Note that once the graph
is built, learning will involve assigning y values to the vertices in the graph. This is made possible
x j }
Search WWH ::




Custom Search