Finding Sentences - Natural Language Processing with Java

Java Reference

In-Depth Information

paragraph = "The colour of money is green. Common fraction "

+ "characters such as ½ are converted to the long form

1/2. "

+ "Quotes such as "cat" are converted to their simpler

form.";

ptb = new PTBTokenizer(

new StringReader(paragraph), new

CoreLabelTokenFactory(),

"americanize=true,normalizeFractions=true,asciiQuotes=true");

wtsp = new WordToSentenceProcessor();

sents = wtsp.process(ptb.tokenize());

for (List<CoreLabel> sent : sents) {

for (CoreLabel element : sent) {

System.out.print(element + " ");

}

System.out.println();

}

The output is as follows:

The color of money is green .

Common fraction characters such as 1/2 are converted to the

long form 1/2 .

Quotes such as " cat " are converted to their simpler form .

The British spelling of the word "colour" was converted to its American equivalent. The

fraction ½ was expanded to three characters: 1/2. In the last sentence, the smart quotes

were converted to their simpler form.

Using the DocumentPreprocessor class

When an instance of the DocumentPreprocessor class is created, it uses its Reader

parameter to produce a list of sentences. It also implements the Iterable interface,

which makes it easy to traverse the list.

In the following example, the paragraph is used to create a StringReader object, and

this object is used to instantiate the DocumentPreprocessor instance:

Search WWH ::

Custom Search

Home