Java Reference
In-Depth Information
dictionary.addEntry(
new DictionaryEntry<String>("Joe","PERSON",1.0));
dictionary.addEntry(
new DictionaryEntry<String>("Fred","PERSON",1.0));
dictionary.addEntry(
new DictionaryEntry<String>("Boston","PLACE",1.0));
dictionary.addEntry(
new DictionaryEntry<String>("pub","PLACE",1.0));
dictionary.addEntry(
new DictionaryEntry<String>("Vermont","PLACE",1.0));
dictionary.addEntry(
new
DictionaryEntry<String>("IBM","ORGANIZATION",1.0));
dictionary.addEntry(
new DictionaryEntry<String>("Sally","PERSON",1.0));
}
An ExactDictionaryChunker instance will use this dictionary. The arguments of
the ExactDictionaryChunker class are detailed here:
Dictionary<String> : It is a dictionary containing the entities
TokenizerFactory : It is a tokenizer used by the chunker
boolean : If it is true , the chunker should return all matches
boolean : If it is true , matches are case sensitive
Matches can be overlapping. For example, in the phrase "The First National Bank", the
entity "bank" could be used by itself or in conjunction with the rest of the phrase. The
third parameter determines if all of the matches are returned.
In the following sequence, the dictionary is initialized. We then create an instance of the
ExactDictionaryChunker class using the Indo-European tokenizer, where we re-
turn all matches and ignore the case of the tokens:
initializeDictionary();
ExactDictionaryChunker dictionaryChunker
= new ExactDictionaryChunker(dictionary,
IndoEuropeanTokenizerFactory.INSTANCE, true, false);
Search WWH ::




Custom Search