Java Reference
In-Depth Information
Survey of NLP tools
There are many tools available that support NLP. Some of these are available with the Java
SE SDK but are limited in their utility for all but the simplest types of problems. Other lib-
raries such as Apache's OpenNLP and LingPipe provide extensive and sophisticated sup-
port for NLP problems.
Low-level Java support includes string libraries, such as String , StringBuilder , and
StringBuffer . These classes possess methods that perform searching, matching, and
text replacement. Regular expressions use special encoding to match substrings. Java
provides a rich set of techniques to use regular expressions.
As discussed earlier, tokenizers are used to split text into individual elements. Java
provides supports for tokenizers with:
• The String class' split method
• The StreamTokenizer class
• The StringTokenizer class
There also exists a number of NLP libraries/APIs for Java. A partial list of Java-based NLP
APIs are found in the following table. Most of these are open source. In addition, there are
a number of commercial APIs available. We will focus on the open source APIs:
API
URL
Apertium
http://www.apertium.org/
General Architecture for Text Engineering http://gate.ac.uk/
Learning Based Java
http://cogcomp.cs.illinois.edu/page/software_view/LBJ
LinguaStream
http://www.linguastream.org/
LingPipe
http://alias-i.com/lingpipe/
Mallet
http://mallet.cs.umass.edu/
MontyLingua
http://web.media.mit.edu/~hugo/montylingua/
Search WWH ::




Custom Search