Introduction to NLP - Natural Language Processing with Java

Java Reference

In-Depth Information

This approach is useful for only the simplest problems.

When text is searched, a common technique is to use a data structure called an inverted in-

dex. This process involves tokenizing the text and identifying terms of interest in the text

along with their position. The terms and their positions are then stored in the inverted in-

dex. When a search is made for the term, it is looked up in the inverted index and the posi-

tional information is retrieved. This is faster than searching for the term in the document

each time it is needed. This data structure is used frequently in databases, information re-

trieval systems, and search engines.

More sophisticated searches might involve responding to queries such as: "Where are

good restaurants in Boston?" To answer this query we might need to perform entity recog-

nition/resolution to identify the significant terms in the query, perform semantic analysis

to determine the meaning of the query, search and then rank candidate responses.

To illustrate the process of finding names, we use a combination of a tokenizer and the

OpenNLP TokenNameFinderModel class to find names in a text. Since this technique

may throw an IOException , we will use a try-catch block to handle it. Declare

this block and an array of strings holding the sentences, as shown here:

try {

String[] sentences = { "Tim was a good neighbor.

Perhaps not as good a Bob " +

"Haywood, but still pretty good. Of course Mr. Adam

" +

"took the cake!"};

// Insert code to find the names here

} catch (IOException ex) {

ex.printStackTrace();

}

Before the sentences can be processed, we need to tokenize the text. Set up the tokenizer

using the Tokenizer class, as shown here:

Tokenizer tokenizer = SimpleTokenizer.INSTANCE;

We will need to use a model to detect sentences. This is needed to avoid grouping terms

that may span sentence boundaries. We will use the TokenNameFinderModel class

Search WWH ::

Custom Search

Home