Social Semantics - Social Semantics: The Search for Meaning on the Web

Information Technology Reference

In-Depth Information

structure of language empirically, which is to be done computationally by the

statistical analysis of actual samples of human language. In other words, in the

building of “language processing programs which had a sound philosophical basis”

(Wilks 2005).

One of the six students of Wittgenstein's course that became The Blue Book ,

Masterman was exposed directly by Wittgenstein to the conceptual apparatus of

the Philosophical Investigations (Sowa 2006). Twenty years later, she founded

the Cambridge Language Research Unit, where the foundations for information

retrieval were laid by a student of Masterman and Masterman's husband Richard

Braithwaithe, Karen Sparck Jones (Wilks 2007). In her dissertation Synonymy

and Semantic Classification ,Sparck Jones stated that her dissertation proposed “a

characterisation of, and a basis for deriving, semantic primitives, i.e. the general

concepts under which natural language words and messages are categorized”

(Sparck Jones 1986). She did this by applying the statistical 'Theory of Clumps'

of Roger Needham - a theory that was itself one of the first to explicate what

Wittgenstein called “family resemblances” - to words themselves, leading her to

posit that words could be defined in terms of statistical clumps of other words

(Needham 1962). Her technique prefigures much of the later work in the 'statistical

turn' of natural language research and our own work in statistical notions of sense

based on terms in the previous two chapters. Applying her work over larger and

larger sources of natural language data, she later abandoned even the open-ended

semantic primitives of Masterman. In her later critique of artificial intelligence, she

argued that one of the key insights of information retrieval is that programs should

take “words as they stand” and not as only adjuncts to some logical knowledge

representation system (Sparck Jones 1999). The connection to search engines is

clear: Altavista, the first modern Web search engine, was created after its inventor,

Mike Burrows, e-mailed Sparck Jones and Needham over techniques in information

retrieval.

Search engines work via analysis of existing web-pages, breaking them down

into terms, and then mapping those terms and their frequencies in a given web-

page into a large index. So, each URI can be thought of as collection of terms

in this search engine index. As the collection of term frequencies gathered into

this index grows, ranging over larger and larger sources of data like the Web, it

approximates human language use, as has been shown by studies in computational

linguistics (Keller and Lapata 2003). Users of a search engine then enter certain

terms, the search query is mapped via certain algorithms against the index. This

results in an unordered list of possibly relevant URIs, which for an index that covers

the entire Web range from thousands to millions of URIs. In turn these URIs are

then ranked and ordered using an algorithm such as Google's famous PageRanking

algorithm, possibly with user feedback (Brin and Page 1998). To explicate how

user-based relevance feedback works, search engines usually keep track of what

URIs are actually clicked on by users. This stream of clicks by multiple users can

be stored in a query log, and this query log can then be used to improve the discovery

and ranking of URIs by search engines. By inspecting which terms lead to which

URIs for multiple users, a set of terms that best describes a URI for users can be

discovered. In this way, typing in terms into a search engine can be thought of as

Social Semantics: The Search for Meaning on the Web

Search WWH ::

Custom Search

Home