Database Reference
In-Depth Information
Discussion
HTML is a markup language: it uses certain characters as markers that have a special
meaning. To include literal instances of these characters in a page, you must encode
them so that they are not interpreted as having their special meanings. For example,
encode < as &lt; to keep a browser from interpreting it as the beginning of a tag. Fur‐
thermore, there are actually two kinds of encoding, depending on the context in which
you use a character. One encoding is appropriate for HTML text, another is used for
text that is part of a URL in a hyperlink.
The MySQL table-display scripts shown in Recipes 18.2 and 18.3 are simple demon‐
strations of how to produce web pages using programs. But with one exception, the
scripts have a common failing: they take no care to properly encode special characters
that occur in the information retrieved from the MySQL server. (The exception is the
JSP version of the script. The <c:out> tag used there handles encoding automatically,
as we'll discuss shortly.)
As it happens, I deliberately chose information to display that is unlikely to contain any
special characters, so the scripts should work properly even in the absence of any en‐
coding. However, in the general case, it's unsafe to assume that a query result contains
no special characters, so you must be prepared to encode it for display in a web page.
Neglecting to do this may result in scripts generating pages containing malformed
HTML that displays incorrectly.
This recipe describes how to handle special characters, beginning with some general
principles, then discusses how each API implements encoding support. The API-
specific examples show how to process information drawn from a database table, but
they can be adapted to any content you include in a web page, no matter its source.
General encoding principles
One form of encoding applies to characters used in writing HTML constructs; another
applies to text included in URLs. It's important to understand this distinction to avoid
encoding text the wrong way.
Encoding text for inclusion in a web page is an entirely different issue
from encoding special characters in data values for inclusion in an
SQL statement. Recipe 2.5 discusses the latter technique.
Encoding characters that are special in HTML. HTML markup uses < and > characters to
begin and end tags, & to begin special entity names (such as &nbsp; to signify a non‐
breaking space), and " to quote attribute values in tags (such as <p align="left"> ).
Consequently, to display literal instances of these characters, you should encode them
Search WWH ::




Custom Search