Java Reference
In-Depth Information
new ObjectInputStream(inputStream);) {
HiddenMarkovModel hmm = (HiddenMarkovModel)
objectStream.readObject();
HmmDecoder decoder = new HmmDecoder(hmm);
…
} catch (IOException ex) {
// Handle exceptions
} catch (ClassNotFoundException ex) {
// Handle exceptions
};
We will perform tagging on
theSentence
variable. First, it needs to be tokenized. We
will use an Indo-European tokenizer as shown here. The
tokenizer
method requires
that the text string be converted to an array of chars. The
tokenize
method then returns
an array of tokens as strings:
TokenizerFactory TOKENIZER_FACTORY =
IndoEuropeanTokenizerFactory.INSTANCE;
char[] charArray = theSentence.toCharArray();
Tokenizer tokenizer =
TOKENIZER_FACTORY.tokenizer(
charArray, 0, charArray.length);
String[] tokens = tokenizer.tokenize();
The actual tagging is performed by the
HmmDecoder
class'
tag
method. However, this
method requires a
List
instance of
String
tokens. This list is created using the
Ar-
rays
class'
asList
method. The
Tagging
class holds a sequence of tokens and tags:
List<String> tokenList = Arrays.asList(tokens);
Tagging<String> tagString = decoder.tag(tokenList);
We are now ready to display the tokens and their tags. The following loop uses the
token
and
tag
methods to access the tokens and tags, respectively, in the
Tagging
ob-
ject. They are then displayed:
for (int i = 0; i < tagString.size(); ++i) {
System.out.print(tagString.token(i) + "/"
+ tagString.tag(i) + " ");
}