Java Reference
In-Depth Information
Using the BreakIterator class
The
BreakIterator
class can be used to detect various text boundaries such as those
between characters, words, sentences, and lines. Different methods are used to create dif-
ferent instances of the
BreakIterator
class as follows:
• For characters, the
getCharacterInstance
method is used
• For words, the
getWordInstance
method is used
• For sentences, the
getSentenceInstance
method is used
• For lines, the
getLineInstance
method is used
Detecting breaks between characters is important at times, for example, when we need to
process characters that are composed of multiple Unicode characters such as ü. This char-
acter is sometimes formed by combining the
\u0075
(u) and
\u00a8
(¨) Unicode charac-
ters. The class will identify these types of characters. This capability is further detailed at
The
BreakIterator
class can be used to detect the end of a sentence. It uses a cursor
that references the current boundary. It supports a
next
and a
previous
method that
moves the cursor forward and backwards in the text, respectively.
BreakIterator
has a
single, protected default constructor. To obtain an instance of the
BreakIterator
class
to detect the end of a sentence, use the static
getSentenceInstance
method, as
shown here:
BreakIterator sentenceIterator =
BreakIterator.getSentenceInstance();
There is also an overloaded version of the method. It takes a
Locale
instance as an argu-
ment:
Locale currentLocale = new Locale("en", "US");
BreakIterator sentenceIterator =
BreakIterator.getSentenceInstance(currentLocale);
Once an instance has been created, the
setText
method will associate the text to be pro-
cessed with the iterator:
sentenceIterator.setText(paragraph);