Java Reference
In-Depth Information
Using LingPipe for NER
We previously demonstrated the use of LingPipe using regular expressions in the
Using
regular expressions for NER
section earlier in this chapter. Here, we will demonstrate how
name entity models and the
ExactDictionaryChunker
class are used to perform
NER analysis.
Using LingPipe's name entity models
LingPipe has a few named entity models that we can use with chunking. These files consist
of a serialized object that can be read from a file and then applied to text. These objects im-
plement the
Chunker
interface. The chunking process results in a series of
Chunking
objects that identify the entities of interest.
A list of the NER models is found in the following table. These models can be downloaded
Genre
Corpus
File
English News
MUC-6
ne-en-news-muc6.AbstractCharLmRescoringChunker
English Genes
GeneTag
ne-en-bio-genetag.HmmChunker
English Genomics GENIA
ne-en-bio-genia.TokenShapeChunker
We will use the model found in the
ne-en-news-
muc6.AbstractCharLmRescoringChunker
file to demonstrate how this class is
used. We start with a try-catch block to deal with exceptions as shown in the following ex-
ample. The file is opened and used with the
AbstractExternalizable
class' static
readObject
method to create an instance of a
Chunker
class. This method will read in
the serialized model:
try {
File modelFile = new File(getModelDir(),
"ne-en-news-muc6.AbstractCharLmRescoringChunker");
Chunker chunker = (Chunker)
AbstractExternalizable.readObject(modelFile);
...