Java Reference
In-Depth Information
Summary
We discussed many of the issues that make sentence detection a difficult task. These in-
clude problems that result from periods being used for numbers and abbreviations. The use
of ellipses and embedded quotes can also be problematic.
Java does provide a couple of techniques to detect the end of a sentence. We saw how regu-
lar expressions and the BreakIterator class can be used. These techniques are useful
for simple sentences, but they do not work that well for more complicated sentences.
The use of various NLP APIs was also illustrated. Some of these process the text based on
rules, while others use models. We also demonstrated how models can be trained and eval-
uated.
In the next chapter, you will learn how to find people and things with text.
Search WWH ::




Custom Search