Java Reference
In-Depth Information
Using Java's regular expressions to find entities
To demonstrate how these expressions can be used, we will start with several simple ex-
amples. The initial example starts with the following declaration. It is a simple expression
designed to identify certain types of phone numbers:
String phoneNumberRE = "\\d{3}-\\d{3}-\\d{4}";
We will use the following code to test our simple expressions. The
Pattern
class'
com-
pile
method takes a regular expression and compiles it into a
Pattern
object. Its
matcher
method can then be executed against the target text, which returns a
Matcher
object. This object allows us to repeatedly identify regular expression matches:
Pattern pattern = Pattern.compile(phoneNumberRE);
Matcher matcher = pattern.matcher(regularExpressionText);
while (matcher.find()) {
System.out.println(matcher.group() + " [" +
matcher.start()
+ ":" + matcher.end() + "]");
}
The
find
method will return
true
when a match occurs. Its
group
method returns the
text that matches the expression. Its
start
and
end
methods give us the position of the
matched text in the target text.
When executed, we will get the following output:
800-555-1234 [68:80]
123-555-1234 [196:208]
A number of other regular expressions can be used in a similar manner. These are listed in
the following table. The third column is the output produced when the corresponding regu-
lar expression is used in the previous code sequence:
Entity
type
Regular expression
Output
\\b(https?|ftp|file|ldap)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/
%=~_|]
http://example.com
[256:274]
URL