Java Reference
In-Depth Information
Using the BreakIterator class
The BreakIterator class can be used to detect various text boundaries such as those
between characters, words, sentences, and lines. Different methods are used to create dif-
ferent instances of the BreakIterator class as follows:
• For characters, the getCharacterInstance method is used
• For words, the getWordInstance method is used
• For sentences, the getSentenceInstance method is used
• For lines, the getLineInstance method is used
Detecting breaks between characters is important at times, for example, when we need to
process characters that are composed of multiple Unicode characters such as ü. This char-
acter is sometimes formed by combining the \u0075 (u) and \u00a8 (¨) Unicode charac-
ters. The class will identify these types of characters. This capability is further detailed at
https://docs.oracle.com/javase/tutorial/i18n/text/char.html .
The BreakIterator class can be used to detect the end of a sentence. It uses a cursor
that references the current boundary. It supports a next and a previous method that
moves the cursor forward and backwards in the text, respectively. BreakIterator has a
single, protected default constructor. To obtain an instance of the BreakIterator class
to detect the end of a sentence, use the static getSentenceInstance method, as
shown here:
BreakIterator sentenceIterator =
BreakIterator.getSentenceInstance();
There is also an overloaded version of the method. It takes a Locale instance as an argu-
ment:
Locale currentLocale = new Locale("en", "US");
BreakIterator sentenceIterator =
BreakIterator.getSentenceInstance(currentLocale);
Once an instance has been created, the setText method will associate the text to be pro-
cessed with the iterator:
sentenceIterator.setText(paragraph);
Search WWH ::




Custom Search