Java Reference
In-Depth Information
Using the SentenceChunker class
An alternative approach is to use the SentenceChunker class to perform SBD. The
constructor of this class requires a TokenizerFactory object and a Sen-
tenceModel object, as shown here:
TokenizerFactory tokenizerfactory =
IndoEuropeanTokenizerFactory.INSTANCE;
SentenceModel sentenceModel = new
IndoEuropeanSentenceModel();
The SentenceChunker instance is created using the tokenizer factory and sentence in-
stances:
SentenceChunker sentenceChunker =
new SentenceChunker(tokenizerfactory, sentenceModel);
The SentenceChunker class implements the Chunker interface that uses a chunk
method. This method returns an object that implements the Chunking interface. This ob-
ject specifies "chunks" of text with a character sequence ( CharSequence ).
The chunk method uses a character array and indexes within the array to specify which
portions of the text need to be processed. A Chunking object is returned like this:
Chunking chunking = sentenceChunker.chunk(
paragraph.toCharArray(),0, paragraph.length());
We will use the Chunking object for two purposes. First, we will use its chunkSet
method to return a Set of Chunk objects. Then we will obtain a string holding all the
sentences:
Set<Chunk> sentences = chunking.chunkSet();
String slice = chunking.charSequence().toString();
A Chunk object stores character offsets of the sentence boundaries. We will use its
start and end methods in conjunction with the slice to display the sentences, as shown
next. Each element, sentence , holds the sentence's boundary. We use this information
to display each sentence in the slice:
Search WWH ::




Custom Search