Architecture of theWorld WideWeb - Social Semantics: The Search for Meaning on the Web

Information Technology Reference

In-Depth Information

main motivation for using URIs as the ending resource of a link as opposed to a

specific Web representation is to prevent broken links , where a user-agent follows

a link to a resource that is no longer there, due to the Web representation itself

changing. As put by the TAG, “Resource state may evolve over time. Requiring a

URI owner to publish a new URI for each change in resource state would lead to a

significant number of broken references. For robustness, Web architecture promotes

independence between an identifier and the state of the identified resource” (Jacobs

and Walsh 2004).

However, one of the distinguishing features of the Web is that links may be

broken by having any access to a Web representation disappear, due to simply the

lack of hosting a Web representation, loss of ownership of the domain name, or some

other reason. These reasons are given in HTTP status codes, such as the infamous

404 Not Found that signals that while there is communication with a server, the

server does not host the resource. Further kinds of broken links are possible, such as

301 Moved Permanently or a 5xx server error, or an inability to even connect

with the server leading to a time-out error. This ability of links to be 'broken'

contrasts to previous hypertext systems. Links were not invented by the Web, but

by the hypertext research community. Constructs similar to links were enshrined in

the earliest of pre-Web systems, such as Engelbart's oNLine System (NLS) (1962),

and were given as part of the early hypertext work by Theodor Nelson (1965). The

plethora of pre-Web hypertext systems were systematized into the Dexter Reference

Model (Halasz and Schwartz 1994). According to the Dexter Reference Model,

the Web would not even qualify as hypertext, but as “proto-hypertext,” since the

Web did not fulfill the criteria of “consistency,” which requires that “in creating

a link, we must ensure that all of its component specifiers resolve to existing

components” (Halasz and Schwartz 1994). To ensure a link must resolve and

therefore not be broken, this mechanism requires a centralized link index that could

maintain the state of each resource and not allow links to be created to non-existent

or non-accessible resources. Many early competitors to the Web, like HyperG, had

a centralized link index (Andrews et al. 1995). As an interesting historical aside, it

appears that the violation of this principle of maintaining a centralized link index

was the main reason why the World Wide Web was rejected from its first academic

conference, ACM Hypertext 1991, although Engelbart did encourage Berners-Lee

and Connolly to pursue the Web further. 13 While a centralized link index would

have the benefit of not allowing a link to be broken, the lack of a centralized link

index removes a bottleneck to growth by allowing the owners of resources to link to

other resources without updating any index besides their own Web representations.

This was doubtless important in enabling the explosive growth of linking. The lack

of any centralized link index, and index of Web representations, is also precisely

what search engines like Google create post-hoc through spidering, in order to have

an index of links and web-pages that enable their keyword search and page ranking

algorithms. As put by Dan Connolly in response to Engelbart, “the design of the

13 Personal communication with Tim Berners-Lee.

Social Semantics: The Search for Meaning on the Web

Search WWH ::

Custom Search

Home