Java Reference
In-Depth Information
Using regular expressions
Regular expressions can be difficult to understand. While simple expressions are not usu-
ally a problem, as they become more complex, their readability worsens. This is one of the
limitations of regular expressions when trying to use them for SBD.
We will present two different regular expressions. The first expression is simple, but does
not do a very good job. It illustrates a solution that may be too simple for some problem
domains. The second is more sophisticated and does a better job.
In this example, we create a regular expression class that matches periods, question marks,
and exclamation marks. The String class' split method is used to split the text into
sentences:
String simple = "[.?!]";
String[] splitString = (paragraph.split(simple));
for (String string : splitString) {
System.out.println(string);
}
The output is as follows:
When determining the end of sentences we need to consider
several factors
Sentences may end with exclamation marks
Or possibly questions marks
Within sentences we may find numbers like 3
14159, abbreviations such as found in Mr
Smith, and possibly ellipses either within a sentence …, or
at the end of a sentence…
As expected, the method splits the paragraph into characters regardless of whether they are
part of a number or abbreviation.
A second approach follows, which produces better results. This example has been adapted
from an example found at http://stackoverflow.com/questions/5553410/regular-expression-
match-a-sentence . The Pattern class, which compiles the following regular expression,
is used:
Search WWH ::




Custom Search