Limited Entropy Dot Com Not so random thoughts on security featured by Eloi Sanfèlix

11Apr/091

Crypto Series: Vigenère’s Cipher (2)

As I promished, we're gonna see a different method to obtain the key length used to encrypt a text using Vigenère's algorithm. This is a method somewhat more difficult to understand than Kasiski's method, since it requires some mathematical analysis to obtain the recipe.

Friedman's test or the incidence of coincidences

This method, discovered by William F. Friedman in the 1920s, is based on computing the index of coincidences of the cryptogram's letters. The idea is that for two random letters from the cryptogram to be the same, there is a possibility that they were also the same in the original plaintext if the number of letters they have in between is a multiple of the key length.

Basically, we'll take the X first letters of the cryptogram and the X last letters, and count the number of coinciding letters in the same position. Finally, we'll divide this number by the number of letters taken and then we will have the index of coincidence.

Considering a source providing independent characters with the frequency distribution of English, and uniformly distributed characters for the key (i.e. all letters with the same frequency, 1/26 for the English alphabet), we have that:

  • The probability that any two letters are the same is approximately 0.0385 when X is not a multiple of the key length
  • The probability that any two letters are the same is approximately 0.0688 when X is a multiple of the key length.

So, with this process wi can determine that high values for the index of coincidence will mean that the shifted distance X is a multiple of the key length, and this way we will determine the most likely key length.

Let's see how we get these probabilities, so that we are able to obtain them in case of having a language different than English. We simply have to consider that for any two ciphertext characters c_i and c_j to coincide, the following relation must hold:

c_i = ( m_i + k_{i mod L} ) = ( m_j + k_{j mod L} ) = c_j

Then, we consider two different cases: if L divides i-j, then m_i = m_j , since in that case we have thatk_{i mod L} = k_{j mod L}   . So, the probability for this case is:

Pr[c_i=c_j] = =Pr[m_i=m_j]= sum_m Pr[m_i=m_j=m]=sum_m p(m)^2 approx 0.0688

However, when i-j is not a multiple of L, then for the two ciphertext characters to be equal the following equation needs to hold

Pr[c_i=c_j] = Pr[ k_{j mod L} = m_i + k_{i mod L} - m_j

But as we said before, the distribution of key characters k_j is uniform, and therefore this probability is:

Pr[c_i=c_j] = frac{1}{26} approx 0.0385

That's it for today. This time there is no example, but stay tuned cause we'll see an exercise soon 🙂

And I hope next week I'm able to post some practical exercise using Cryptool to analyze a Vigenère cipher or something alike.

Posted by Eloi Sanfèlix

Comments (1) Trackbacks (0)

Leave a comment

No trackbacks yet.