Java Reference
In-Depth Information
Using the NLP APIs
We will demonstrate POS tagging using OpenNLP, Stanford API, and LingPipe. Each of
ture
of
Twenty Thousands Leagues Under the Sea
by
Jules Verne
:
private String[] sentence = {"The", "voyage", "of", "the",
"Abraham", "Lincoln", "was", "for", "a", "long", "time",
"marked",
"by", "no", "special", "incident."};
The text to be processed may not always be defined in this fashion. Sometimes the sentence
will be available as a single string:
String theSentence = "The voyage of the Abraham Lincoln was
for a "
+ "long time marked by no special incident.";
We might need to convert a string to an array of strings. There are numerous techniques for
converting this string to an array of words. The following
tokenizeSentence
method
performs this operation:
public String[] tokenizeSentence(String sentence) {
String words[] = sentence.split("S+");
return words;
}
The following code demonstrates the use of this method:
String words[] = tokenizeSentence(theSentence);
for(String word : words) {
System.out.print(word + " ");
}
System.out.println();
The output is as follows:
The voyage of the Abraham Lincoln was for a long time marked
by no special incident.