Basic Mimicry - Disappearing Cryptography

Cryptography Reference

In-Depth Information

so it would be statistically correct, but it did not explain how to hide

information in the process. Nor did it explain how to run Huffman

compression in reverse.

The information is hidden by letting the data to be concealed

dictate the choice of the next letter. In the example described above,

either “a”, “e”, or “o” could follow the starting letters “The l”. It is easy

to come up with a simple scheme for encoding information. If “a”

stands for “1”, “e” stands for “2” and “o” stands for “3”, then common

numbers could be encoded in the choice of the letters. Someone at

a distance could recover this value if they had a copy of the same

source text,

, that generated the table of statistics. The could look

up “The l” and discover that there are three letters that follow “he l”

in the table. The letter “e” is the second choice in alphabetical order,

so the letter “e” stands for the message “2”.

A long text like the one shown in Figure 6.1 could hide a different

number in each letter. If there were no choice about the next letter

to be added to the output, though, then no information could be

hidden. That letter would not hide anything.

Simply using a letter to encode a number is not an efficient or a

flexible way to send data. What if you wanted to send the message

“4” and there were only three choices? What if you wanted to send

a long picture? What if your data wanted to send the value “1”, but

the first letter was the least common choice. Would this destroy the

statistical composition?

Running Huffman codes in reverse is the solution to all of these

problems. Figure 6.2 shows a simple Huffman tree constructed from

the three choices of letters to follow “The l”. The tree was constructed

using the statistics that showed that the letter “e” followed in 16 out of

the 20 times while the letters “a” and “o” both followed twice apiece.

Messages are encoded with a Huffman tree like this with a vari-

able number of bits. The choice of “e” encodes the bit “0”; the choice

of “a” encodes “10”; and the choice of “o” encodes the message “11”.

These bits can be recovered at the other end by reversing this choice.

The number of bits that are hidden with each choice of a letter varies

directly with the number of choices that are possible and the proba-

bilities that govern the choice.

There should generally bemore than three choices available if the

source text

S

is large enough to offer some variation, but there will

rarely be a full 26 choices. This is only natural because English has

plenty of redundancy built into the language. Shannon recognized

this when he set up information theory. If the average entropy of En-

glish is about 3 bits per character, then this means that there should

only be about 2 3 oreightchoicesthatcanbemadeforthenextchar-

S

Disappearing Cryptography

Search WWH ::

Custom Search

Home