Classifying Texts and Documents - Natural Language Processing with Java

Java Reference

In-Depth Information

Chapter 6. Classifying Texts and

Documents

In this chapter, we will demonstrate how to use various NLP APIs to perform text classific-

ation. This is not to be confused with text clustering. Clustering is concerned with the iden-

tification of text without the use of predefined categories. Classification, in contrast, uses

predefined categories. We will focus on text classification where tags are assigned to text to

specify its type.

The general approach used to perform text classification starts with the training of a model.

The model is validated and then used to classify documents. We will focus on the training

and usage steps.

Documents can be classified according to any number of attributes such as its subject, doc-

ument type, time of publication, author, language used, and reading level. Some classifica-

tion approaches require humans to label sample data.

Sentiment analysis is a type of classification. It is concerned with determining what text is

trying to convey to a reader, usually in the form of a positive and negative attitude. We will

investigate several techniques to perform this type of analysis.

Search WWH ::

Custom Search

Home