Java Reference
In-Depth Information
Using LingPipe to classify text
We will use LingPipe to demonstrate a number of classification tasks including general text
classification using trained models, sentiment analysis, and language identification. We will
cover the following classification topics:
• Training text using the Classified class
• Training models using other training categories
• How to classify text using LingPipe
• Performing sentiment analysis using LingPipe
• Identifying the language used
Several of the tasks described in this section will use the following declarations. LingPipe
comes with training data for several categories. The categories array contains the
names of the categories packaged with LingPipe:
String[] categories = {"soc.religion.christian",
"talk.religion.misc","alt.atheism","misc.forsale"};
The DynamicLMClassifier class is used to perform the actual classification. It is cre-
ated using the categories array giving it the names of the categories to use. The
nGramSize value specifies the number of contiguous items in a sequence used in the
model for classification purposes:
int nGramSize = 6;
DynamicLMClassifier<NGramProcessLM> classifier =
DynamicLMClassifier.createNGramProcess(
categories, nGramSize);
Training text using the Classified class
General text classification using LingPipe involves training the DynamicLMClassifi-
er class using training files and then using the class to perform the actual classification.
LingPipe comes with several training datasets as found in the LingPipe directory, demos/
data/fourNewsGroups/4news-train . We will use these to illustrate the training
process. This example is a simplified version of the process found at http://alias-i.com/ling-
pipe/demos/tutorial/classify/read-me.html .
We start by declaring the training directory:
Search WWH ::




Custom Search