Finding People and Things - Natural Language Processing with Java

Java Reference

In-Depth Information

dictionary.addEntry(

new DictionaryEntry<String>("Joe","PERSON",1.0));

dictionary.addEntry(

new DictionaryEntry<String>("Fred","PERSON",1.0));

dictionary.addEntry(

new DictionaryEntry<String>("Boston","PLACE",1.0));

dictionary.addEntry(

new DictionaryEntry<String>("pub","PLACE",1.0));

dictionary.addEntry(

new DictionaryEntry<String>("Vermont","PLACE",1.0));

dictionary.addEntry(

new

DictionaryEntry<String>("IBM","ORGANIZATION",1.0));

dictionary.addEntry(

new DictionaryEntry<String>("Sally","PERSON",1.0));

}

An ExactDictionaryChunker instance will use this dictionary. The arguments of

the ExactDictionaryChunker class are detailed here:

• Dictionary<String> : It is a dictionary containing the entities

• TokenizerFactory : It is a tokenizer used by the chunker

• boolean : If it is true , the chunker should return all matches

• boolean : If it is true , matches are case sensitive

Matches can be overlapping. For example, in the phrase "The First National Bank", the

entity "bank" could be used by itself or in conjunction with the rest of the phrase. The

third parameter determines if all of the matches are returned.

In the following sequence, the dictionary is initialized. We then create an instance of the

ExactDictionaryChunker class using the Indo-European tokenizer, where we re-

turn all matches and ignore the case of the tokens:

initializeDictionary();

ExactDictionaryChunker dictionaryChunker

= new ExactDictionaryChunker(dictionary,

IndoEuropeanTokenizerFactory.INSTANCE, true, false);

Search WWH ::

Custom Search

Home