Information Technology Reference
In-Depth Information
of the frequency of recurrence of words, and particularly of multiword
sequences, can be made to follow from these understandings.
Word
Shannon's own brief but highly illuminating explorations of the structure
of written language, wherein the line of writing is effectively understood
as the message of information theory, yield a definition of the word as “a
cohesive group of letters [of printed English] with strong internal statisti-
cal influences” (Shannon 1951/1993, 197-198). This conception of the
word is consistent with the Saussurean understanding of the word as a
unit that compels recognition by the mind, but is more historically and
medium-specific and less tautological. Shannon's definition of the word is
specific to the written and printed word, not the oral medium, although it
may have some application to handwritten utterances regarded as com-
posed of discrete characters rather than as a continuous or broken line.
While Shannon's definition of the word has been largely neglected since
its publication in 1951, beginning signs of its adoption in connection with
the computational parsing of language have appeared, 2 and it is poten-
tially highly significant. Most fundamentally, Shannon's definition is con-
gruent with an emphasis on the material basis for communication and its
concentration on humanly and potentially automatically detectable pat-
terns.
Each element of the definition can also be understood in a specific
material sense, strongly related to linearity. Group can be understood as
spatially grouped, particularly within the line of writing, and separated
from contiguous words within the line by the space; the space itself is
recognized as a character rather than simply the absence of a character.
Cohesive can be understood in both a related and further developed sense
of group —related, as the cohesion or mutual stickiness implied by the let-
ters grouping together between spaces; and further developed, as cohesion
between units of the word, or letters, which exhibit transitions between
units, acceptable within that written language. Strong internal statistical
influences imply that particular letters co-occur with one another, includ-
ing transition probabilities between two individual letters but extending
beyond immediately contiguous transitions to longer sequences.
Search WWH ::




Custom Search