HTML and CSS Reference
Several metacharacters anchor the regular expression to a particular location in the document without actually
matching anything themselves. These include
The beginning of a line.
The end of a line.
A word boundary, including a space or line break.
Any location that is not a word boundary.
The beginning of the document.
The end of the document.
The end of the document, unless the document ends with a line break. In this case, it is the position
immediately before the final line break.
Because HTML is not very line-oriented, we tend not to use ^ and $ very much. However, \b and \B are quite
useful, and \A , \Z , and \z sometimes are, too. For example, \bcat\b matches the word cat but does not match
inside the words category , catheter , or abdicate . (Some GUI tools, including BBEdit but not jEdit, give you an
option to only match entire words. This is essentially the same as putting \b before and after your expression.)
Other possible uses include
Find all documents that start with <html or <HTML and thus don't have a DOCTYPE declaration or a byte