Java Reference
In-Depth Information
Using LingPipe POS taggers
LingPipe uses the
Tagger
interface to support POS tagging. This interface has a single
method:
tag
. It returns a
List
instance of the
Tagging
objects. These objects are the
words and their tags. The interface is implemented by the
ChainCrf
and
HmmDecoder
classes.
The
ChainCrf
class uses linear-chain conditional random field decoding and estimation
for determining tags. The
HmmDecoder
class uses an HMM to perform tagging. We will
illustrate this class next.
The
HmmDecoder
class uses the
tag
method to determine the most likely (first best) tags.
It also has a
tagNBest
method that scores the possible tagging and returns an iterator of
these scored tagging. There are three POS models that come with the LingPipe, which can
be downloaded from
http://alias-i.com/lingpipe/web/models.html
.
These are listed in the
following table. For our demonstration, we will use the Brown Corpus model:
Model
File
English General Text: Brown Corpus
pos-en-general-brown.HiddenMarkovModel
English Biomedical Text: MedPost Corpus
pos-en-bio-medpost.HiddenMarkovModel
English Biomedical Text: GENIA Corpus
pos-en-bio-genia.HiddenMarkovModel
Using the HmmDecoder class with Best_First tags
We start with a try-with-resources block to handle exceptions and the code to create the
HmmDecoder
instance, as shown next.
The model is read from the file and then used as the argument of the
HmmDecoder
con-
structor:
try (
FileInputStream inputStream =
new FileInputStream(getModelDir()
+ "//pos-en-general-brown.HiddenMarkovModel");
ObjectInputStream objectStream =