Java Reference
In-Depth Information
Chapter 4. Pattern Matching with
Regular Expressions
Introduction
Suppose you have been on the Internet for a few years and have been very faithful about sav-
ing all your correspondence, just in case you (or your lawyers, or the prosecution) need a
copy. The result is that you have a 5 GB disk partition dedicated to saved mail. And let's fur-
ther suppose that you remember that somewhere in there is an email message from someone
named Angie or Anjie. Or was it Angy? But you don't remember what you called it or where
you stored it. Obviously, you have to look for it.
But while some of you go and try to open up all 15,000,000 documents in a word processor,
I'll just find it with one simple command. Any system that provides regular expression sup-
port allows me to search for the pattern in several ways. The simplest to understand is:
Angie|Anjie|Angy
which you can probably guess means just to search for any of the variations. A more concise
form (“more thinking, less typing”) is:
An[^ dn]
The syntax will become clear as we go through this chapter. Briefly, the “A” and the “n”
match themselves, in effect finding words that begin with “An”, while the cryptic [^ dn] re-
quires the “An” to be followed by a character other than ( ^ means not in this context) a space
(to eliminate the very common English word “an” at the start of a sentence) or “d” (to elim-
inate the common word “and”) or “n” (to eliminate Anne, Announcing, etc.). Has your word
processor gotten past its splash screen yet? Well, it doesn't matter, because I've already
found the missing file. To find the answer, I just typed the command:
grep 'An[^ dn]' *
Search WWH ::




Custom Search