Database Reference
In-Depth Information
Chapter 3
Collective Classification for Text
Classification
Galileo Namata, Prithviraj Sen, Mustafa Bilgic, and Lise Getoor
.............................................................
3.1
Introduction
51
3.2
Collective Classification: Notation and Problem Definition
............
53
3.3
Approximate Inference Algorithms for Approaches Based on Local
Conditional Classifiers
...................................................
53
3.4
Approximate Inference Algorithms for Approaches Based on Global
Formulations
.............................................................
56
3.5
Learning the Classifiers
..................................................
60
3.6
Experimental Comparison
...............................................
60
3.7
Related Work
............................................................
64
3.8
Conclusion
...............................................................
66
3.9
Acknowledgments
........................................................
66
3.1 Introduction
Text classification, the classification of text documents according to cate-
gories or topics, is an important component of any text processing system.
There is a large body of work which makes use of content - the words appear-
ing in the documents, the structure of the documents - and external sources
to build accurate document classifiers. In addition, there is a growing body of
literature on methods which attempt to make use of the link structure among
the documents in order to improve document classification performance.
Text documents can be connected together in a variety of ways. The most
common link structure is the citation graph: e.g., papers cite other papers
and webpages link to other webpages. But links among papers can be con-
structed from other relationships such as co-author, co-citation, appearance
at a conference venue, and others. All of these can be combined together to
create a interlinked collection of text documents.
In these cases, we are often not interested in determining the topic of just a
single document, but we have a collection of unlabeled (or partially labeled)
documents, and we want to correctly infer values for all of the missing labels.
 
 
 
 
 
 
 
 
 
 
Search WWH ::




Custom Search