Java Reference
In-Depth Information
Entity
type
Regular expression
Output
ZIP
code
12345-1234
[150:160]
[0-9]{5}(\\-?[0-9]{4})?
E-
mail
rgb@colorworks.com
[27:45]
[a-zA-Z0-9'._%+-]+@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,4}
8:00 [217:221]
Time (([0-1]?[0-9])|([2][0-3])):([0-5]?[0-9])(:([0-5]?[0-9]))?
4:30 [229:233]
((0?[13578]|10|12)(-|\\/)(([1-9])|(0[1-9])|([12])([0-9]?)|(3[01]?))(-|\\/
)((19)([2-9])(\\d{1})|(20)([01])(\\d{1})|([8901])(\\d{1}))|(0?[2469]|11)(-
|\\/)(([1-9])|(0[1-9])|([12])([0-9]?)|(3[0]?))(-|\\/
)((19)([2-9])(\\d{1})|(20)([01])(\\d{1})|([8901])(\\d{1})))
2/25/1954
[315:324]
Date
There are many other regular expressions that we could have used. However, these ex-
amples illustrate the basic technique. As demonstrated with the date regular expression,
some of these can be quite complex.
It is common for regular expressions to miss some entities and to falsely report other non-
entities as entities. For example, if we replace the text with the following expression:
regularExpressionText =
"(888)555-1111 888-SEL-HIGH 888-555-2222-J88-W3S";
Executing the code will return this:
888-555-2222 [27:39]
It missed the first two phone numbers and falsely reported the "part number" as a phone
number.
We can also search for more than one regular expression at a time using the | operator. In
the following statement, three regular expressions are combined using this operator. They
are declared using the corresponding entries in the previous table:
Pattern pattern = Pattern.compile(phoneNumberRE + "|"
+ timeRE + "|" + emailRegEx);
Search WWH ::




Custom Search