Cryptography Reference
In-Depth Information
1
0
a
0
1
c
b
Figure 6.4: An ordinary Huffman tree built for three characters, “a”,
“b”, and “c” that occur with probabilities of 50%, 37.5%, and 12.5%
respectively.
Figure 6.5 shows a new version of the Huffman tree designed to
balance the distribution. There are now two extra layers added to the
tree. The branching choices made in these extra two layers would
use extra bits supplied by a pseudo-random generator. When they
were recovered, these bits would be discarded. It should be easy to
establish that “b” will emerge 37.5%of the time and “c” will be output
12.5% of the time if the data being hidden is perfectly distributed.
The cost of this process is efficiency. The new tree may produce
output with the right distribution, but decoding is often not possible.
The letter “b” is produced from the leaves with addresses 100, 101,
and 110. Since only the first bit remains constant with the tree in
Figure 6.5 then only one bit can be hidden with the letter “b”. The
other two bits would be produced by the pseudo-random bit stream
and not recovered at the other end. The tree in Figure 6.4 would hide
two bits with the letter “b”, but it would produce a “b” 25% of the
time. This is the trade-off of efficiency versus accuracy.
How many bits are hidden or encoded if a “c” is output? It could
either be three that are encoded when a 111 is found in the input file
or it could be one bit padded in the same manner as the letter “b”.
Either choice is fine.
This technique can be extended significantly to support any
amount of precision. The most important step is to make sure that
there will be no ambiguity in the decoding process. If the same char-
acter exists on both branches, then no bit can be encoded using any
of the subtree descending from this point.
Thismeansthatitisnotpossibletoencodedatawhichisdom-
Search WWH ::




Custom Search