Collective Classification for Text Classification - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

X 6

L2

Y 1

X 7

X 1

X 2

W 1

W 2

L1

X 4

W 1

Y 2

X 5

X 3

W 3

FIGURE 3.1 : A small text classification problem. Each box denotes a

document, each directed edge between a pair of boxes denotes a hyperlink,

and each oval node denotes a random variable. Assume the smaller oval nodes

within each box represent the presence of the words, w 1 ,w 2 , and w 3 ,inthe

document and the larger oval nodes the label of the document where the set

of label values is

. A shaded oval denotes an observed variable

whereas an unshaded oval node denotes an unobserved variable whose value

needs to be predicted.

L

=

{

L 1 ,L 2

}

This is straightforward when we consider only document content; the problem

is somewhat complicated when we consider the links among the missing labels.

In this case, we want to jointly or collectively optimize the labels, and we refer

to the problem as collective classification .

Collective classification methods range from simple local influence propa-

gation algorithms to more complex global optimization algorithms. At their

heart, they try to model the combined correlations among labels of neighbor-

ing documents. Some models assume that neighboring labels are likely to be

the same or similar (homophily, or autocorrelation), while others are capable

of learning more complex dependencies.

In this chapter, we present several of the common algorithms for collective

classification. Our collection of algorithms is not exhaustive, and we are not

presenting some of the most advanced algorithms. Instead, we try to provide

the reader with a simple tutorial introduction to the methods, with a focus

on the algorithms rather than the mathematical justification.

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home