Java Reference
In-Depth Information
Using Java's regular expressions to find entities
To demonstrate how these expressions can be used, we will start with several simple ex-
amples. The initial example starts with the following declaration. It is a simple expression
designed to identify certain types of phone numbers:
String phoneNumberRE = "\\d{3}-\\d{3}-\\d{4}";
We will use the following code to test our simple expressions. The Pattern class' com-
pile method takes a regular expression and compiles it into a Pattern object. Its
matcher method can then be executed against the target text, which returns a Matcher
object. This object allows us to repeatedly identify regular expression matches:
Pattern pattern = Pattern.compile(phoneNumberRE);
Matcher matcher = pattern.matcher(regularExpressionText);
while (matcher.find()) {
System.out.println(matcher.group() + " [" +
matcher.start()
+ ":" + matcher.end() + "]");
}
The find method will return true when a match occurs. Its group method returns the
text that matches the expression. Its start and end methods give us the position of the
matched text in the target text.
When executed, we will get the following output:
800-555-1234 [68:80]
123-555-1234 [196:208]
A number of other regular expressions can be used in a similar manner. These are listed in
the following table. The third column is the output produced when the corresponding regu-
lar expression is used in the previous code sequence:
Entity
type
Regular expression
Output
\\b(https?|ftp|file|ldap)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/
%=~_|]
http://example.com
[256:274]
URL
Search WWH ::




Custom Search