Information Technology Reference
In-Depth Information
understanding of the relative but incomplete redundancy of the space.
While early forms of alphabetic written language do not mark word
boundaries, the absence of the space symbol from Morse code required
the introduction of further telegraphic transmission codes to prevent con-
fusion over letter boundaries and word-endings in the message received
(Warner 1993, 310-312). The space is a recent introduction to spell check-
ers for word processing systems. The reduced text produced by reducing
redundancy can be considered an encoded version of the original:
Roughly speaking, ideal prediction collapses the probabilities of various symbols
to a small group more than any other translating operation involving the same
number of letters which is instantaneously reversible. (Shannon 1951/1993, 204)
Analogously, shorthand—diffused in the late nineteenth century and con-
sidered here as a mapping from full to reduced written sequences, even if
primarily intended for the transcribing of oral speech—reduced redun-
dancy in the full sequence and was reversible. Reduced redundancy might
introduce errors in reversing, particularly if the shorthand message was
transmitted to a destination different from the information source and
not used for private recollection over time, informed by knowledge of the
circumstances of production of the original utterance. 3
Shannon's experiments did not fully distinguish syntactic, or pattern-
based, predictions from the semantic predictions of human subjects,
introducing (possibly partly unconsciously) issues of meaning that are
foreign to information theory's primary concern with expression. Thus,
we can distinguish syntactic from semantic prediction and focus atten-
tion on the syntactic, or pattern-based, level already partially embodied
in text compression programs. The possibilities of prediction are greatest
with high redundancy and low entropy for subsequent characters, when
sequences immediately below the word are considered: for instance, in
written English the sequence strengt can normally only be followed by h .
These predictive possibilities would be consistent with Shannon's under-
standing of the word as “a cohesive group of letters with strong internal
statistical influences” (Shannon 1951/1993, 197-198). Compared with
prediction from a combination of syntactic and semantic considerations,
prediction based on purely syntactic features may cover shorter sequences
and accordingly yield a lower value for redundancy and a higher entropy
for written English.
Search WWH ::




Custom Search