Java Reference
In-Depth Information
Other useful wildcards include simple sets (
p[aeiou]p
matches
pop
and
pup
but not
pgp
, while
[a-z]
matches any single lowercase letter); nega-
tions (
[^aeiou]
matches anything that is not a single lowercase vowel);
predefined sets (
\d
matches any digit;
\s
any whitespace character);
and boundaries (
^twisty
matches the word "twisty" only at the begin-
ning of a line;
\balike
matches "alike" only after a word boundary, that
is, at the beginning of a word).
Special symbols for particular characters include
\t
for tab;
\n
for
newline;
\a
for the alert (bell) character;
\e
for escape; and
\\
for back-
slash itself. Any character that would otherwise have a special meaning
can be preceded by a
\
to remove that meaning; in other words
\
c
al-
ways represents the character
c
. This is how, for example, you would
match a
*
in an expressionby using
\*
.
Special symbols start with the
\
character, which is also the character
used to introduce an escape character. This means, for example, that
in the string expression
"\balike"
, the actual pattern will consist of a
backspace character followed by the word
"alike"
, while
"\s"
would not
be a pattern for whitespace but would cause a compile-time error be-
cause
\s
is not valid escape character. To use the special symbols with-
in a string expression the leading
\
must itself be escaped using
\\
, so
the example strings become
"\\balike"
and
"\\s"
, respectively. To in-
clude an actual backslash in a pattern it has to be escaped twice, using
four backslash characters:
"\\\\"
. Each backslash pair becomes a single
backslash within the string, resulting in a single backslash pair being in-
cluded in the pattern, which is then interpreted as a single backslash
character.
Regular expressions can also
capture
parts of the string for later use,
either inside the regular expression itself or as a means of picking out
parts of the string. You capture parts of the expression inside paren-
theses. For example, the regular expression
(.)-(.*)-\2-\1
matches
x-
yup-yup-x
or
ñ-å-å-ñ
or any other similar string because
\1
matches the
one, in order of the appearance of their opening parenthesis.