Java Reference
In-Depth Information
To express these more complex patterns, regexs use metacharacters . These are spe‐
cial characters that indicate that special processing is required. This can be thought
of as similar to the use of the * character in the Unix or Windows shell. In those
circumstances, it is understood that the * is not to be interpreted literally but
instead means “anything.” If we wanted to list all the Java source files in the current
directory on Unix, we would issue the command:
ls *. java
The metacharacters of regexs are similar, but there are far more of them, and they
are far more flexible than the set available in shells. They also have different mean‐
ings than they do in shell scripts, so don't get confused.
Let's meet a couple of examples. Suppose we want to have a spell-checking program
that is relaxed about the difference in spelling between British and American
English. This means that honor and honour should both be accepted as valid spelling
choices. This is easy to do with regular expressions.
Java uses a class called Pattern (from the package java.util.regex ) to represent a
regex. This class can't be directly instantiated, however. Instead, new instances are
created by using a static factory method, compile() . From a pattern, we then derive
a Matcher for a particular input string that we can use to explore the input string.
For example, let's examine a bit of Shakespeare from the play Julius Caesar :
Pattern p = Pattern . compile ( "honou?r" );
String caesarUK = "For Brutus is an honourable man" ;
Matcher mUK = p . matcher ( caesarUK );
String caesarUS = "For Brutus is an honorable man" ;
Matcher mUS = p . matcher ( caesarUS );
System . out . println ( "Matches UK spelling? " + mUK . find ());
System . out . println ( "Matches US spelling? " + mUS . find ());
Be careful when using Matcher as it has a method called
matches() . However, this method indicates whether the pat‐
tern can cover the entire input string. It will return false if
the pattern only starts matching in the middle of the string.
The last example introduces our first regex metacharacter ? , in the pattern honou?r .
This means “the preceding character is optional”—so both honour and honor will
match. Let's look at another example. Suppose we want to match both minimize and
minimise (the latter spelling is more common in British English). We can use square
brackets to indicate that any character from a set (but only one alternative) [] can
be used—like this:
Pattern p = Pattern . compile ( "minimi[sz]e" );
Search WWH ::




Custom Search