Regular Expressions - Refactoring HTML: Improving the Design of Existing Web Applications

HTML and CSS Reference

In-Depth Information

Position

Several metacharacters anchor the regular expression to a particular location in the document without actually

matching anything themselves. These include

^

The beginning of a line.

$

The end of a line.

\b

A word boundary, including a space or line break.

\B

Any location that is not a word boundary.

\A

The beginning of the document.

\z

The end of the document.

\Z

The end of the document, unless the document ends with a line break. In this case, it is the position

immediately before the final line break.

Because HTML is not very line-oriented, we tend not to use ^ and $ very much. However, \b and \B are quite

useful, and \A , \Z , and \z sometimes are, too. For example, \bcat\b matches the word cat but does not match

inside the words category , catheter , or abdicate . (Some GUI tools, including BBEdit but not jEdit, give you an

option to only match entire words. This is essentially the same as putting \b before and after your expression.)

Other possible uses include

\ A\s*(<html|<HTML)

Find all documents that start with <html or <HTML and thus don't have a DOCTYPE declaration or a byte

order mark.

Search WWH ::

Custom Search

Home