HTML and CSS Reference
In-Depth Information
Alternation: |
The vertical bar, |, allows you to choose between two possible values. For example, suppose you want to search
for all years in the twentieth or twenty-first century; 1904, 1952, 1999, 2001, 2059, and so on. The basic rule is
that the first two characters must be either 19 or 20. The second two characters must be digits. 19\d\d
matches all years in the twentieth century. [1] 20\dd matches all years in the twenty-first century.
(19\d\d)|(20\d\d) matches both sets of years. We could also write this as (19|20)\d\d —that is, either 19 or
20 followed by two digits.
[1] Pedants beware: Because there was no year 0, 1900 is really in the nineteenth century and 2000 is the twentieth, but I'm going to ignore
that.
Alternation is also important for matching HTML tags. For example, suppose you want a general expression for
matching all start tags. The problem you run into is that there are three ways an attribute can appear, and each
has its own regular expression:
name=value
[a-zA-Z]+\s*=\s*[^\s'">]+
name="value"
[a-zA-Z]+\s*=\s*"[^">]*"
name='value'
[a-zA-Z]+\s*=\s*'[^'>]*'
We can combine these regular expressions with an alternation, like so:
<[a-zA-Z]+\s*([a-zA-Z]+\s*=\s*[^\s'">]+
|[a-zA-Z]+\s*=\s*"[^">]*"|[a-zA-Z]+\s*=\s*'[^'>]*')*>
This finds all single-quoted, double-quoted, and nonquoted attributes. (It also finds name=value parameters in
URL query strings, which was not intended.)
Search WWH ::




Custom Search