HTML and CSS Reference
doesn't tell us much about where we are going. We could be visiting an HTML example,
a 1980s pop-video of Rick Astley, or some horrid drive-by malware download. Short URLs
may save space, but they are not only cryptic but potentially dangerous. Further, we must
hope that the service that powers our shortened URL lives on and that the usage data they
glean from watching users traverse the link is not used for troubling ends.
Location, Not Meaning
The primary problem with URLs is that they define location rather than meaning. In other
words, URLs specify where something is located on the Web, not what it is or what it's
about. This might not seem to be a big deal, but it is. For example, the text of the HTML5
specification is a useful document and certainly has an address at the W3C Web site. But
does it live in other places on the Internet? For certain, it can be found at its original parent,
WhatWG, and is likely mirrored in a variety of locations. However, if we focus solely on the
W3C server and it is unreachable, or DNS services fail to resolve the host, we are stuck if we
focus on location. Rather than trying to find a particular document, wherever it might be on
the Internet, Web users try to go to a particular location. Rather than talking about where
something is, Web users should try to talk about what that something is.
Talking about what a document is rather than where it is makes sense when you consider
how information is organized outside the Internet. In general, few people talk about which
library carries a particular book, or what shelf it is on. The relevant information is the title of
the topic, its author, and perhaps some other information. But what happens if two or more
topics have the same title, or two authors have the same name? This actually is quite common.
Generally, a book should have a unique identifier such as an ISBN number that, when
combined with other descriptive information, such as the author, publisher, and publication
date, uniquely describes the topic. This naming scheme enables people to specify a particular
book and then hunt it down.
The Web, however, isn't as orderly as a library. On the Web, people name their documents
whatever they like, and search robots organize their indexes however they like. Categorizing
things is difficult. The only unique item for documents is the URL, which simply says where
the document lives. But how many URLs does the HTML5 specification have? A document
might exist in many places. Even worse than a document with multiple locations, what
happens when the content at the location changes? Perhaps a particular URL address points
to information about dogs one day and cats the next. This is how the Web really is. While
search engines like Google do a great deal to sort this mess out, there is still a great deal to fix,
and thus there is a great deal of research being performed to address some of the shortcomings
of Web addressing and data meaning.
New Addressing Schemes: URNs, URCs, and URIs
Consider the idea of the information describing this topic. It may have a unique identifier
for it, such as an ISBN number. It has many characteristics that describe it, such as its cost,