Java Reference
In-Depth Information
While this is a straightforward process that is easy to code, the hard work is in defining the pattern to
achieve the result that you want. This is an extensive topic since in their full glory regular expressions
are immensely powerful and can get very complicated. There are topics devoted entirely to this so our
aim will be to get enough of a bare bones understanding of how regular expressions work so you will be
in a good position to look into the subject in more depth if you need to. Although regular expressions
can look quite fearsome, don't be put off. They are always built step-by-step, so although the end result
may look complicated and obscure, they are not at all difficult to put together. Regular expressions are a
lot of fun and a sure way to impress your friends and maybe confound your enemies.
Defining Regular Expressions
You may not have heard of regular expressions before reading this topic and therefore may think you
have never used them. If so, you are almost certainly wrong. Whenever you search a directory for files
of a particular type, " *.java " for instance, you are using a form of regular expression. However, to say
that regular expressions can do much more than this is something of an understatement. To get an
understanding of what we can do with regular expressions, we will start at the bottom with the simplest
kind of operation and work our way up to some of the more complex problems they can solve.
Creating a Pattern
In its most elementary form, a regular expression just does a simple search for a substring. For example,
if we want to search a string for the word had , the regular expression is exactly that. So the string
defining this particular regular expression is " had ". Let's use this as a vehicle for understanding the
programming mechanism for using regular expressions. We can create a Pattern object for our
expression " had " with the statement:
Pattern had = Pattern.compile("had");
The static compile() method in the Pattern class returns a reference to a Pattern object that
contains the compiled regular expression. The method will throw an exception of type
PatternSyntaxException if the regular expression passed as the argument is invalid. However, you
don't have to catch this exception as it is a subclass of RuntimeException and therefore is
unchecked. The compilation process stores the regular expression in a Pattern object in a form that is
ready to be processed by a Matcher state-machine.
There's a further version of the compile() method that enables you to control more closely how the
pattern will be applied when looking for a match. The second argument is a value of type int that
specifies one or more of the following flags that are defined in the Pattern class:
CASE _ INSENSITIVE
Matches ignoring case, but assumes only US-ASCII characters are being
matched.
MULTILINE
Enables the beginning or end of lines to be matched anywhere. Without
this flag only the beginning and end of the entire sequence will be
matched.
UNICODE _ CASE
When this is specified in addition to CASE _ INSENSITIVE , case
insensitive matching will be consistent with the Unicode standard.
Search WWH ::




Custom Search