HTML and CSS Reference
In-Depth Information
tools will convert all tag and attribute names to lowercase. They will also change entity names that need to be
in lowercase.
You also can accomplish this with regular expressions. Because HTML element and attribute names are
composed exclusively of the Latin letters A to Z and a to z , this isn't too difficult. Let's start with the element
names. There are likely to be thousands, perhaps millions, of these, so you don't want to fix them by hand.
Tags are easy to search for. This regular expression will find all start-tags that contain at least one capital
letter:
<[a-zA-Z]*[A-Z]+[a-zA-Z]*
This regular expression will find all end-tags that contain at least one capital letter:
</[a-zA-Z]*[A-Z]+[a-zA-Z]*>
Entities are also easy. This regular expression finds all entity references that contain a capital letter other than
the initial letters:
&[A-Za-z] [A-Za-z] [A-Z]+[A-Za-z]*;
I set up the preceding regular expression to find at least three capital letters to avoid accidentally triggering on
references such as &Omega; that should have a single initial capital letter and on references such as &AElig;
that have two initial capital letters. This may miss some cases, such as &Amp; and &AMp; , but those are rare in
practice. Usually entity references are either all uppercase or all lowercase. If any such mixed cases exist, we'll
find them later with xmllint and fix them by hand.
Attributes are trickier to find because the pattern to find them ( =name ) may appear inside the plain text of the
document. I much prefer to use Tidy or TagSoup to fix these. However, if you know you have a large problem
with particular attributes, it's easy to do a search and replace for individual ones—for instance, HREF= to href= .
As long as you aren't writing about HTML, that string is unlikely to appear in plain text content.
Sometimes your initial find will discover that only a few tags use uppercase. For instance, if there are lots of
uppercase table tags, you can quickly change <TD> to <td> , </TD> to </td> , <TR> to </tr> , and so forth without
even using regular expressions. If the problem is a little broader, consider using Tidy or TagSoup. If that doesn't
work, you'll need a tool that can replace text while changing its case. jEdit can't do this. However, Perl and
BBEdit can. Use \L in the replacement pattern to convert all characters to lowercase. For example, let's start
with the regular expression for start-tags:
(<[a-zA-Z]*[A-Z]+[a-zA-Z]*)
This expression will replace it with its lowercase equivalent:
\L\1
Search WWH ::




Custom Search