Well-Formedness - Refactoring HTML: Improving the Design of Existing Web Applications

HTML and CSS Reference

In-Depth Information

Replace Imaginary Entity References

Make sure all entity references used in the document are defined.

&copyright; 2007 TIC Corp.

Motivation

Occasionally, authors begin to use entity references that simply don't exist. Sometimes it's a simple typo, such

as &apm; instead of & . Sometimes it's misremembered code, such as &tm; instead of ™ or

&copyright; instead of © . Either way, this causes display problems for all browsers and should be fixed.

Potential Trade-offs

None. This is only good.

Mechanics

The hardest problem is finding these imaginary entity references, because there's not necessarily any rhyme or

reason to them. Often, the first time you realize there's a problem is while browsing your site. If you're lucky it

will appear in the plain text like this:

&copyright; 2007 TIC Corp.

If not, the browser will just drop it out completely:

2007 TIC Corp.

The same mistakes do tend to repeat themselves, so once you've noticed a problem, a straight search and

replace will usually find and fix all other occurrences.

Otherwise, validation (or at least well-formedness checking) is necessary to identify these issues. Once a

validator finds such imaginary entity references, you can fix them by hand if they aren't too numerous, or with a

targeted search and replace if they are.

Occasionally, you'll find someone has invented an entity reference that perhaps should exist but doesn't: ¥

for ¥ or &bet; for the Hebrew letter . Although it's theoretically possible to define new entity references such

as these in the internal DTD subset or external DTD, I do not recommend this. XML parsers can handle this, but

browsers cannot. Either replace the references with the actual characters (especially if you already reencoded

the document in UTF-8) or use a numeric character reference such as ¥ or ב .

Search WWH ::

Custom Search

Home