Database Reference
In-Depth Information
X 6
L2
Y 1
X 7
X 1
X 2
W 1
W 2
L1
X 4
W 1
Y 2
X 5
X 3
W 3
FIGURE 3.1 : A small text classification problem. Each box denotes a
document, each directed edge between a pair of boxes denotes a hyperlink,
and each oval node denotes a random variable. Assume the smaller oval nodes
within each box represent the presence of the words, w 1 ,w 2 , and w 3 ,inthe
document and the larger oval nodes the label of the document where the set
of label values is
. A shaded oval denotes an observed variable
whereas an unshaded oval node denotes an unobserved variable whose value
needs to be predicted.
L
=
{
L 1 ,L 2
}
This is straightforward when we consider only document content; the problem
is somewhat complicated when we consider the links among the missing labels.
In this case, we want to jointly or collectively optimize the labels, and we refer
to the problem as collective classification .
Collective classification methods range from simple local influence propa-
gation algorithms to more complex global optimization algorithms. At their
heart, they try to model the combined correlations among labels of neighbor-
ing documents. Some models assume that neighboring labels are likely to be
the same or similar (homophily, or autocorrelation), while others are capable
of learning more complex dependencies.
In this chapter, we present several of the common algorithms for collective
classification. Our collection of algorithms is not exhaustive, and we are not
presenting some of the most advanced algorithms. Instead, we try to provide
the reader with a simple tutorial introduction to the methods, with a focus
on the algorithms rather than the mathematical justification.
 
Search WWH ::




Custom Search