Java Reference
In-Depth Information
out of a database that's never stored in a filesystem. ISBN%3D1565924851 selects the
particular topic from the database by its ISBN number, cafeaulaitA specifies who gets
the referral fee if a purchase is made from this link, and 002-3777605-3043449 is a session
key used to track the visitor's path through the site.
Some URIs aren't at all hierarchical, at least in the filesystem sense. For example, snews://
secnews.netscape.com/netscape.devs-java has a path of /netscape.devs-java . Although
there's some hierarchy to the newsgroup names indicated by the period between net‐
scape and devs-java , it's not encoded as part of the URI.
The scheme part is composed of lowercase letters, digits, and the plus sign, period, and
hyphen. The other three parts of a typical URI (authority, path, and query) should each
be composed of the ASCII alphanumeric characters (i.e., the letters A-Z, a-z, and the
digits 0-9). In addition, the punctuation characters - _ . ! and ~ may also be used.
Delimiters such as / ? & and = may be used for their predefined purposes. All other
characters, including non-ASCII alphanumerics such as á and ζ as well as delimiters
not being used as delimiters should be escaped by a percent sign (%) followed by the
hexadecimal codes for the character as encoded in UTF-8. For instance, in UTF-8, á is
the two bytes 0xC3 0xA1 so it would be encoded as %c3%a1 . The Chinese character
is Unicode code point 0x6728. In UTF-8, this is encoded as the three bytes E6, 9C, and
A8. Thus, in a URI it would be encoded as %E6%9C%A8.
If you don't hexadecimally encode non-ASCII characters like this, but just include them
directly, then instead of a URI you have an IRI (an Internationalized Resource Identi‐
fier). IRIs are easier to type and much easier to read, but a lot of software and protocols
expect and support only ASCII URIs.
Punctuation characters such as / and @ must also be encoded with percent escapes if
they are used in any role other than what's specified for them in the scheme-specific
part of a particular URL. For example, the forward slashes in the URI http://www.cafeau‐
lait.org/books/javaio2/ do not need to be encoded as %2F because they serve to delimit
the hierarchy as specified for the http URI scheme. However, if a filename includes a /
character—for instance, if the last directory were named Java I/O instead of javaio2 to
more closely match the name of the topic—the URI would have to be written as http://
www.cafeaulait.org/books/Java%20I%2FO/ . This is not as far-fetched as it might sound
to Unix or Windows users. Mac filenames frequently include a forward slash. Filenames
on many platforms often contain characters that need to be encoded, including @, $, +,
=, and many more. And of course URLs are, more often than not, not derived from
filenames at all.
URLs
A URL is a URI that, as well as identifying a resource, provides a specific network
location for the resource that a client can use to retrieve a representation of that resource.
By contrast, a generic URI may tell you what a resource is, but not actually tell you where
Search WWH ::




Custom Search