Java Reference
In-Depth Information
Creating a dictionary from a file
If we need to create a new dictionary, then one approach is to create an XML file contain-
ing all of the words and their tags, and then create the dictionary from the file. OpenNLP
supports this approach with the
POSDictionary
class'
create
method.
The XML file consists of the
dictionary
root element followed by a series of
entry
elements. The
entry
element uses the
tags
attribute to specify the tags for the word.
The word is contained within the
entry
element as a
token
element. A simple ex-
ample using two words stored in the file
dictionary.txt
is as follows:
<dictionary case_sensitive="false">
<entry tags="JJ VB">
<token>strong</token>
</entry>
<entry tags="NN VBP VB">
<token>force</token>
</entry>
</dictionary>
To create the dictionary, we use the
create
method based on an input stream as shown
here:
try (InputStream dictionaryIn =
new FileInputStream(new File("dictionary.txt"));) {
POSDictionary dictionary =
POSDictionary.create(dictionaryIn);
…
} catch (IOException e) {
// Handle exceptions
}
The
POSDictionary
class has an
iterator
method that returns an iterator object. Its
next
method returns a string for each word in the dictionary. We can use these methods
to display the contents of the dictionary, as shown here:
Iterator<String> iterator = dictionary.iterator();
while (iterator.hasNext()) {
String entry = iterator.next();