Cryptography Reference
In-Depth Information
If a stream is made up of bytes with values between 0 and 255
and every byte value occurs with equal probability of
1
256
, then the
entropy of the stream is 8 bits per byte. If only two bytes, say 43 and
95, each occur half of the time and the other 254 bytes don't occur
at all, the entropy of this stream is only 1 bit per byte. In this basic
example, it should be obvious how the bit stream can be compressed
by a factor of 8 to 1 bit per character. In more complex examples, the
entropy is still a good roughmeasure of howwell a basic compression
algorithm will do.
The limitations of Shannon's measure of information are pretty
obvious. An information stream that repeats the bytes 0
,
1
,
2
,...,
254
,
255
ad infinitum would appear to contain 8 bits of informa-
tion per byte. But, there really isn't that much information being
conveyed. You could write a short two-line program in most com-
puter languages that would duplicate the result. This computer pro-
gram could stand in for this stream of information and it would be
substantially cheaper to ship this program across the network than it
would be to pay for the cost of sending an endless repeat stream of
bytes.
In a sense, this repeating record computer program is a good
compressed form of the information. If the data was potato chips,
you would hope that it was measured by the number of lines in a
computer program that could generate it, not the Shannon entropy.
There is another measure of information known as the Kolmogorov
complexity that attempts to measure the information by determin-
ing the size of the smallest program that could generate the data.
This is a great theoretical tool for analyzing algorithms, but it is en-
tirely impractical. Finding the smallest program is both theoretically
and practically impossible because no one can test all possible pro-
grams. It might be a short program in C, but how do we know the
length in Pascal, Smalltalk, or a language that no one has written yet?
The Shannon measure of information can bemade more compli-
cated by including the relationship between adjacent characters:
,
0
,
1
...
x i |x j )log
1
ρ
(
.
ρ
(
x i |x j )
i,j
ρ
x j in the information
stream. The sum is computed over all possible combinations. This
measure does a good job of picking up some of the nature of the
English language. The occurrence of a letter varies significantly. “h”
is common after a “t” but not after a “q”. This measure would also
pick up the pattern in the example of 0
(
x i |x j ) means the probability that
x i
follows
,...
But there are many slightly more complicated patterns that could
,
1
,
2
,...,
255
,
0
,
1
Search WWH ::




Custom Search