URLs - HTML and CSS: The Complete Reference

HTML and CSS Reference

In-Depth Information

doesn't tell us much about where we are going. We could be visiting an HTML example,

a 1980s pop-video of Rick Astley, or some horrid drive-by malware download. Short URLs

may save space, but they are not only cryptic but potentially dangerous. Further, we must

hope that the service that powers our shortened URL lives on and that the usage data they

glean from watching users traverse the link is not used for troubling ends.

Location, Not Meaning

The primary problem with URLs is that they define location rather than meaning. In other

words, URLs specify where something is located on the Web, not what it is or what it's

about. This might not seem to be a big deal, but it is. For example, the text of the HTML5

specification is a useful document and certainly has an address at the W3C Web site. But

does it live in other places on the Internet? For certain, it can be found at its original parent,

WhatWG, and is likely mirrored in a variety of locations. However, if we focus solely on the

W3C server and it is unreachable, or DNS services fail to resolve the host, we are stuck if we

focus on location. Rather than trying to find a particular document, wherever it might be on

the Internet, Web users try to go to a particular location. Rather than talking about where

something is, Web users should try to talk about what that something is.

Beyond URLs

Talking about what a document is rather than where it is makes sense when you consider

how information is organized outside the Internet. In general, few people talk about which

library carries a particular book, or what shelf it is on. The relevant information is the title of

the topic, its author, and perhaps some other information. But what happens if two or more

topics have the same title, or two authors have the same name? This actually is quite common.

Generally, a book should have a unique identifier such as an ISBN number that, when

combined with other descriptive information, such as the author, publisher, and publication

date, uniquely describes the topic. This naming scheme enables people to specify a particular

book and then hunt it down.

The Web, however, isn't as orderly as a library. On the Web, people name their documents

whatever they like, and search robots organize their indexes however they like. Categorizing

things is difficult. The only unique item for documents is the URL, which simply says where

the document lives. But how many URLs does the HTML5 specification have? A document

might exist in many places. Even worse than a document with multiple locations, what

happens when the content at the location changes? Perhaps a particular URL address points

to information about dogs one day and cats the next. This is how the Web really is. While

search engines like Google do a great deal to sort this mess out, there is still a great deal to fix,

and thus there is a great deal of research being performed to address some of the shortcomings

of Web addressing and data meaning.

New Addressing Schemes: URNs, URCs, and URIs

Consider the idea of the information describing this topic. It may have a unique identifier

for it, such as an ISBN number. It has many characteristics that describe it, such as its cost,

Search WWH ::

Custom Search

Home