Java Reference
In-Depth Information
Using OpenNLP for NER
We will demonstrate the use of the
TokenNameFinderModel
class to perform NLP us-
ing the OpenNLP API. Additionally, we will demonstrate how to determine the probability
that the entity identified is correct.
The general approach is to convert the text into a series of tokenized sentences, create an
instance of the
TokenNameFinderModel
class using an appropriate model, and then
use the
find
method to identify the entities in the text.
The following example demonstrates the use of the
TokenNameFinderModel
class. We
will use a simple sentence initially and then use multiple sentences. The sentence is defined
here:
String sentence = "He was the last person to see Fred.";
We will use the models found in the
en-token.bin
and
en-ner-person.bin
files
for the tokenizer and name finder models, respectively. The
InputStream
object for
these files is opened using a try-with-resources block, as shown here:
try (InputStream tokenStream = new FileInputStream(
new File(getModelDir(), "en-token.bin"));
InputStream modelStream = new FileInputStream(
new File(getModelDir(), "en-ner-person.bin"));) {
...
} catch (Exception ex) {
// Handle exceptions
}
Within the try block, the
TokenizerModel
and
Tokenizer
objects are created:
TokenizerModel tokenModel = new
TokenizerModel(tokenStream);
Tokenizer tokenizer = new TokenizerME(tokenModel);
Next, an instance of the
NameFinderME
class is created using the person model: