Java Reference
In-Depth Information
Using LingPipe POS taggers
LingPipe uses the Tagger interface to support POS tagging. This interface has a single
method: tag . It returns a List instance of the Tagging objects. These objects are the
words and their tags. The interface is implemented by the ChainCrf and HmmDecoder
classes.
The ChainCrf class uses linear-chain conditional random field decoding and estimation
for determining tags. The HmmDecoder class uses an HMM to perform tagging. We will
illustrate this class next.
The HmmDecoder class uses the tag method to determine the most likely (first best) tags.
It also has a tagNBest method that scores the possible tagging and returns an iterator of
these scored tagging. There are three POS models that come with the LingPipe, which can
be downloaded from http://alias-i.com/lingpipe/web/models.html . These are listed in the
following table. For our demonstration, we will use the Brown Corpus model:
Model
File
English General Text: Brown Corpus
pos-en-general-brown.HiddenMarkovModel
English Biomedical Text: MedPost Corpus pos-en-bio-medpost.HiddenMarkovModel
English Biomedical Text: GENIA Corpus pos-en-bio-genia.HiddenMarkovModel
Using the HmmDecoder class with Best_First tags
We start with a try-with-resources block to handle exceptions and the code to create the
HmmDecoder instance, as shown next.
The model is read from the file and then used as the argument of the HmmDecoder con-
structor:
try (
FileInputStream inputStream =
new FileInputStream(getModelDir()
+ "//pos-en-general-brown.HiddenMarkovModel");
ObjectInputStream objectStream =
Search WWH ::




Custom Search