Java Reference
In-Depth Information
Using LingPipe for NER
We previously demonstrated the use of LingPipe using regular expressions in the Using
regular expressions for NER section earlier in this chapter. Here, we will demonstrate how
name entity models and the ExactDictionaryChunker class are used to perform
NER analysis.
Using LingPipe's name entity models
LingPipe has a few named entity models that we can use with chunking. These files consist
of a serialized object that can be read from a file and then applied to text. These objects im-
plement the Chunker interface. The chunking process results in a series of Chunking
objects that identify the entities of interest.
A list of the NER models is found in the following table. These models can be downloaded
from http://alias-i.com/lingpipe/web/models.html :
Genre
Corpus
File
English News
MUC-6 ne-en-news-muc6.AbstractCharLmRescoringChunker
English Genes
GeneTag ne-en-bio-genetag.HmmChunker
English Genomics GENIA ne-en-bio-genia.TokenShapeChunker
We will use the model found in the ne-en-news-
muc6.AbstractCharLmRescoringChunker file to demonstrate how this class is
used. We start with a try-catch block to deal with exceptions as shown in the following ex-
ample. The file is opened and used with the AbstractExternalizable class' static
readObject method to create an instance of a Chunker class. This method will read in
the serialized model:
try {
File modelFile = new File(getModelDir(),
"ne-en-news-muc6.AbstractCharLmRescoringChunker");
Chunker chunker = (Chunker)
AbstractExternalizable.readObject(modelFile);
...
Search WWH ::




Custom Search