Regular Expressions - Refactoring HTML: Improving the Design of Existing Web Applications

HTML and CSS Reference

In-Depth Information

\A\s*(<body|<BODY)

Find all documents that start with <body or <BODY and thus don't have a proper html root element.

</[hH][tT][mM][lL]>\s*\Z

Find all documents that end with </html> in various combinations of case, optionally followed by

whitespace.

Table A.6 summarizes all of these patterns.

Table A.6. Regular-Expression Syntax

Pattern

Matches

.

Any one character

^

Beginning of line

$

End of line

c*

Zero or more c 's

c+

One or more c 's

c?

Zero or one c

c*?

Zero or more c 's, as few as possible

c+?

One or more c 's, as few as possible

c??

Zero or one c, as few as possible

c{ count }

Exactly count c 's

c{ count ,}

At least count c 's

c{ min , max }

At least min c 's and at most max c 's

[a-zA-z]

Any one of the characters from a-z or A-Z

[abc]

Any one of the characters between the brackets

[^abc]

Any one of the characters not between the brackets

[a-z]

Any one of the characters from a-z

[a-zA-z]

Any one of the characters from a-z or A-Z

\A

Beginning of document

\z

End of document

\Z

End of document, but before trailing line break, if any

\b

Boundary of a word, that is, the beginning or end of a word

\B

Not the boundary of a word

\s

Any whitespace character (space, tab, carriage return, line feed)

Refactoring HTML: Improving the Design of Existing Web Applications

Search WWH ::

Custom Search

Home