Information Technology Reference
In-Depth Information
Classic Information-Retrieval Research
The development of experimental information retrieval systems usu-
ally has occurred independent of linguistic theory. In the early 1970s a
review concluded, “The most striking fact to emerge from the literature,
however, is the difficulty of marrying linguistic techniques and retrieval
objectives” (Sparck Jones and Kay 1973, 197). Linguistically, very crude
procedures seemed to work quite well for retrieval (understood primarily
as the transformation of a query into a set of records) and it was unclear
what more sophisticated procedures could contribute (Sparck Jones and
Kay 1973, 197). Similarly, indexing solutions given to selection prob-
lems from natural language owed very little to linguistics (Gardin 1973,
140). Prior to more widely diffused interest in metalanguages during the
late 1980s, only library schools had accepted that “[humanly assigned]
retrieval terminologies . . . [were] worthy of sustained study and research”
(Roberts 1989, 103). An understanding of languages of description—par-
ticularly those produced by algorithmic transformations on the language
of discourse—had to be constructed from linguistics rather than imported
as an established product. In this context, this topic has addressed the
Janus-like character of information—familiarly regarded as facing both
the technical world of bytes and data compression and the social world
of language and meaning (Gregory 2005)—that also requires, equally sig-
nificant but less fully addressed, understanding from the human and dis-
cursive as well as the mathematical and computational sciences
The value of classic experimental information retrieval research to cur-
rent practice and understandings is further reduced by the understand-
ings developed. We must question the historically inherited preoccupation
with the word as a unit, and with the statistical distribution of word forms
(Zipf 1936). The received idea of the word as unit of meaning should
be abandoned finally and replaced with a more sophisticated model of
signification. The degree of success of information retrieval based on
the word unit, and the continuing utility of individual word-based tech-
niques, results from the coincidence of the language's semantic compo-
nent, even if understood as an associative paradigm with syntactically
separated sequences of the message. The utility for retrieval of combin-
ing established techniques with developing understandings of the stability
of meaning and the frequency of occurrence and applying them to more
Search WWH ::




Custom Search