Cryptography Reference
In-Depth Information
5. If the characters are different, then one bit can be encodedwith
the choice. If you are hiding a 0 using this mimicry, then output
the character found beginning at position
i
. If you are hiding
a 1, then output the character found after the search began at
i
+ k
2
mod
k
.
This solution can be decoded. All of the information encoded
here can be recovered as long as both the encoder and the decoder
have access to the same source file and the same stream of
values
coming from a pseudo-random source. The pseudo-random gener-
ator ensures that all possible combinations are uncovered. This does
assume, however, that the candidates of
i
n −
1 characters are evenly
distributed throughout the text.
The solution can also be expanded to store more than one bit per
output letter. You could begin the search at four different locations
and hope that you uncover four different possible letters to output.
If you do, then you can encode two bits. This approach can be ex-
tended still further, but each search does slow the output.
In general, the fishing solution is the slowest and most cumber-
some of all the approaches. Looking up each new letter takes an
amount of time proportional to the occurrence of the
1 character
group in the data. The array has the fastest lookup, but it can be pro-
hibitively large in many cases. The tree has the next fastest lookup
and is probably the most generally desirable for text applications.
n−
6.3.1 Goosing with Extra Data
Alas, statistical purity is often hard to generate. If the data to be hid-
den has maximum entropy, then the letters that emerge from the
Huffman-tree based mimicry will emerge with a probability distri-
bution that seems a bit suspicious. Every letter will appear with a
probability of the form 1
2 i ; that is, 50%, 25%, 12.5%, and so on. This
may not be that significant, but it might be detected.
/
Music is also fair game.
Many have
experimented with
using musical rules of
composition to create
newmusic from
statistical models of
existing music. One
paper on the topic is
[BJNW57].
Better results can be obtained by trading off some of the efficiency
and using a pseudo-random number generator to add more bits to
make the choice better approximate the actual occurrence in the
data.
This technique can best be explained by example. Imagine that
there are three characters, “a”, “b”, and “c” that occur with probabil-
ities of 50%, 37.5%, and 12.5% respectively. The ordinary Huffman
tree would look like the one in Figure 6.4. The character “a” would
occur in the output file 50% of the time. This would be fine. But “b”
and “c” would both occur 25% of the time. “b” will occur as often as
“c”, not three times as often as dictated by the source file.
Search WWH ::




Custom Search