Information Technology Reference
In-Depth Information
main motivation for using URIs as the ending resource of a link as opposed to a
specific Web representation is to prevent broken links , where a user-agent follows
a link to a resource that is no longer there, due to the Web representation itself
changing. As put by the TAG, “Resource state may evolve over time. Requiring a
URI owner to publish a new URI for each change in resource state would lead to a
significant number of broken references. For robustness, Web architecture promotes
independence between an identifier and the state of the identified resource” (Jacobs
and Walsh 2004).
However, one of the distinguishing features of the Web is that links may be
broken by having any access to a Web representation disappear, due to simply the
lack of hosting a Web representation, loss of ownership of the domain name, or some
other reason. These reasons are given in HTTP status codes, such as the infamous
404 Not Found that signals that while there is communication with a server, the
server does not host the resource. Further kinds of broken links are possible, such as
301 Moved Permanently or a 5xx server error, or an inability to even connect
with the server leading to a time-out error. This ability of links to be 'broken'
contrasts to previous hypertext systems. Links were not invented by the Web, but
by the hypertext research community. Constructs similar to links were enshrined in
the earliest of pre-Web systems, such as Engelbart's oNLine System (NLS) (1962),
and were given as part of the early hypertext work by Theodor Nelson (1965). The
plethora of pre-Web hypertext systems were systematized into the Dexter Reference
Model (Halasz and Schwartz 1994). According to the Dexter Reference Model,
the Web would not even qualify as hypertext, but as “proto-hypertext,” since the
Web did not fulfill the criteria of “consistency,” which requires that “in creating
a link, we must ensure that all of its component specifiers resolve to existing
components” (Halasz and Schwartz 1994). To ensure a link must resolve and
therefore not be broken, this mechanism requires a centralized link index that could
maintain the state of each resource and not allow links to be created to non-existent
or non-accessible resources. Many early competitors to the Web, like HyperG, had
a centralized link index (Andrews et al. 1995). As an interesting historical aside, it
appears that the violation of this principle of maintaining a centralized link index
was the main reason why the World Wide Web was rejected from its first academic
conference, ACM Hypertext 1991, although Engelbart did encourage Berners-Lee
and Connolly to pursue the Web further. 13 While a centralized link index would
have the benefit of not allowing a link to be broken, the lack of a centralized link
index removes a bottleneck to growth by allowing the owners of resources to link to
other resources without updating any index besides their own Web representations.
This was doubtless important in enabling the explosive growth of linking. The lack
of any centralized link index, and index of Web representations, is also precisely
what search engines like Google create post-hoc through spidering, in order to have
an index of links and web-pages that enable their keyword search and page ranking
algorithms. As put by Dan Connolly in response to Engelbart, “the design of the
13 Personal communication with Tim Berners-Lee.
Search WWH ::




Custom Search