Java Reference
In-Depth Information
Classification classification = classifier.classify(review);
String bestCategory = classification.bestCategory();
System.out.println("Best Category: " + bestCategory);
When executed, we get the following output:
Best Category: pos
This approach will also work well for other categories of text.
Language identification using LingPipe
LingPipe comes with a model, langid-leipzig.classifier , trained for several
languages and is found in the demos/models directory. A list of supported languages is
found in the following table. This model was developed using training data derived from
the Leipzig Corpora Collection ( http://corpora.uni-leipzig.de/ ). Another good tool can be
found at http://code.google.com/p/language-detection/ .
Language
Abbreviation
Language
Abbreviation
Catalan
cat
Italian
it
Danish
dk
Japanese jp
English
en
Korean
kr
Estonian ee
Norwegian no
Finnish
fi
Sorbian
sorb
French
fr
Swedish se
German de
Turkish
tr
To use this model, we use essentially the same code we used in the Classifying text using
LingPipe section earlier in this chapter. We start with the same movie review of Forrest
Gump:
Search WWH ::




Custom Search