Java Reference
In-Depth Information
Using OpenNLP for NER
We will demonstrate the use of the TokenNameFinderModel class to perform NLP us-
ing the OpenNLP API. Additionally, we will demonstrate how to determine the probability
that the entity identified is correct.
The general approach is to convert the text into a series of tokenized sentences, create an
instance of the TokenNameFinderModel class using an appropriate model, and then
use the find method to identify the entities in the text.
The following example demonstrates the use of the TokenNameFinderModel class. We
will use a simple sentence initially and then use multiple sentences. The sentence is defined
here:
String sentence = "He was the last person to see Fred.";
We will use the models found in the en-token.bin and en-ner-person.bin files
for the tokenizer and name finder models, respectively. The InputStream object for
these files is opened using a try-with-resources block, as shown here:
try (InputStream tokenStream = new FileInputStream(
new File(getModelDir(), "en-token.bin"));
InputStream modelStream = new FileInputStream(
new File(getModelDir(), "en-ner-person.bin"));) {
...
} catch (Exception ex) {
// Handle exceptions
}
Within the try block, the TokenizerModel and Tokenizer objects are created:
TokenizerModel tokenModel = new
TokenizerModel(tokenStream);
Tokenizer tokenizer = new TokenizerME(tokenModel);
Next, an instance of the NameFinderME class is created using the person model:
Search WWH ::




Custom Search