Cryptography Reference
In-Depth Information
Of course, the frequency distribution of characters depends on the language
itself and, for example, the three most frequent letters in Spanish are e , a , and o ,
with frequencies about 0.137, 0.125 and 0.087, respectively. Moreover, natural lan-
guages also have characteristic frequency distributions for n -grams (which are very
uneven), and, in particular, those corresponding to digrams and trigrams can be eas-
ily computed by counting their occurrences in a sufficient amount of text. Now,
these frequency distributions are preserved when the text is encrypted by means of
a substitution cipher. Since each occurrence of a given character in the plaintext is
always encrypted in the same manner, the frequency distribution of characters in the
ciphertext will be the same as the one in the plaintext, albeit with the frequencies
corresponding to different letters. Thus, a substitution cipher can be cryptanalyzed,
assuming ciphertext-only, by comparing the frequency distribution of characters in
the ciphertext with the known frequency distribution of characters in the plaintext
language (which is known by Eve according to Kerckhoffs' principle). This is a trial-
and-error process because the frequency distribution of characters in the plaintext
need not be exactly the same as the expected one for the language in question and
the process will be easier the longer the ciphertext Eve has at her disposal.
There are extreme cases, such as the topic Gadsby by Wright [199], which does
not contain the letter e . But even this ad hoc rarity would not prevent a cryptanalyst
from deciphering an encrypted version of this topic, because the distribution of the
remaining letters will be close to the expected one. Summing up, the cryptanalysis of
the simple substitution cipher is very easy, as the literary examples given by Poe and
Conan Doyle show. Of course, the attack is even easier if Eve has known plaintext
(or chosen plaintext or chosen ciphertext) as then it is only a matter of comparing
the letters in the plaintext with the corresponding letters in the ciphertext, which will
reveal (portions of) the key.
We will now sketch an example of cryptanalysis of a substitution cipher. Sup-
pose we are given the following ciphertext which we know (Kerckhoffs' principle!)
was encrypted with a substitution cipher over the English alphabet. Here we will
write the ciphertext in upper case while we assume the plaintext is in lower case,
which is convenient to handle partially decrypted ciphertexts. Note that spaces and
punctuation symbols were not encrypted.
c := "Z MPLAO XIDG FPAO XGW FXG FWLFX,FXIF FXG BIYG MIAMLAIFZPU SXZMX XIO BGWDGO YG
KPW OGMZEXGWZUJ FXG YIULBMWZEF XIO GUINAGO YG FP AGIWU FXG SPWO, NLF PU I MIEWZMG
ZF BFWLMR YG FP FGAA XGW FXIF I JGUZG XIO WGDGIAGO ZF FP YG. FXZB KIABG OZBMAPBLWG
KGFFGWGO YIOIYG OLWKG FP YG. FXIF OIV Z NGMIYG FXG YIBFGW PK XGW BPLA, IUO Z INLBGO
YV EPSGW. GDGWV FZYG Z FXZUR PK ZF, Z IY OZBFWGBBGO IUO IBXIYGO, IUO Z OP EGUIUMG
UPS ZU FXG PNAZJIFZPU LUOGW SXZMX Z EAIMG YVBGAK PK FGAAZUJ FXG FWLFX ZU SWZFZUJ YV
YGYPZWB.":
The first step to cryptanalyze this would be to do a frequency analysis of the
ciphertext. This is easy with Maple, as the function CharacterFrequencies
of the package StringTools (which we assume is already loaded) does precisely
this.
> freq := CharacterFrequencies(c);
freq := " " = 97, "," = 5, "." = 4, "A" = 17, "B" = 17, "D" = 4, "E" = 6, "F" = 40,
"G" = 56, "I" = 33, "J" = 5, "K" = 8, "L" = 12, "M" = 14, "N" = 5, "O" = 24, "P" =
21, "R" = 2, "S" = 6, "U" = 20, "V" = 5, "W" = 24, "X" = 27, "Y" = 19, "Z" = 31
 
Search WWH ::




Custom Search