1: # Simple-xor encryption (which is very insecure) works like
   2: # this:
   3: #   $cipher[$i] = chr(ord($plain[$i]) ^ $key[$i%@key])
   4: # Double-xor has an extra key, like this:
   5: #   $cipher[$i] = chr(ord($plain[$i]) ^ $key1[$i%@key1]
   6: #                     ^ $key2[$i%@key2])
   7: # The claim has been made that double-xor encryption has an
   8: # effective key length equal to the product of the lengths
   9: # of the two keys, making it much more secure than
  10: # simple-xor.  This is not true.  Below is a very
  11: # efficient known plaintext attack that recovers the two
  12: # xor keys, given a number of known bytes about equal to
  13: # the sum (not the product) of the key lengths.  This is
  14: # done using base-2 linear algebra -- I thought the matrix
  15: # reduction step was kind of cool.  This attack is easily
  16: # generalized to triple-xor or higher.
  17: 
  18: # Note: If you xor every element in the two keys by a
  19: # constant value, the constant will cancel out when you do
  20: # the encryption.  The keys determined by the method
  21: # presented here might therefore be off by a constant, but
  22: # it doesn't matter because they will encrypt and decrypt
  23: # just the same.
  24: 
  25: # Getting the known plaintext is surprisingly easy.  The
  26: # upper bits of ASCII text are very nonrandom (for
  27: # instance, since lower case letters are most common, bits
  28: # 5 and 6 are usually 1).  This makes it possible to work
  29: # out the upper bits of the xor keys using other
  30: # techniques.  Once that is done, you can guess which
  31: # ciphertext characters correspond to spaces in the
  32: # plaintext.  A little more guesswork is often needed -
  33: # try common words that are the right length.  I've been
  34: # able to successfully attack messages of length about 3
  35: # times the total length of the keys.  A skilled
  36: # cryptanalyst could probably break an even shorter
  37: # message.
  38: 
  39: # Input:
  40: #   $keylen1, $keylen2: Length of the two xor keys
  41: #   $cipher: The enciphered message
  42: #   %plain: Known (or guessed) plaintext bytes.  Hash keys
  43: #           are the position in the message
  44: 
  45: # Build the coefficient matrix.  Since it's very sparse,
  46: # the rows are stored as hashes rather than arrays
  47: my $columns = $keylen1 + $keylen2;
  48: my @matrix;
  49: while (my ($pos, $ch) = each %plain) {
  50:   my $i = $pos % $keylen1;
  51:   my $j = $pos % $keylen2 + $keylen1;
  52:   my $val = ord($ch) ^ ord(substr $cipher, $pos, 1);
  53:   push @matrix, { $i=>1, $j=>1, val=>$val, "pos$pos"=>1 };
  54: }
  55: 
  56: # Row-reduce the matrix
  57: my $rank = 0;
  58: foreach my $col (0..$columns-1) {
  59:   # choose a pivot row which depends on $col
  60:   my $i;
  61:   for ($i = $rank; $i < $columns; $i++) {
  62:     last if $matrix[$i]->{$col};
  63:   }
  64:   next if $i >= $columns;
  65:   my $pivot = $matrix[$i];
  66:   @matrix[$rank, $i] = @matrix[$i, $rank]
  67:       unless $i == $rank;
  68:   $rank++;
  69:   # remove the dependence on $col from the other rows
  70:   foreach my $row (@matrix) {
  71:     next if $row == $pivot || !$row->{$col};
  72:     $row->{$_} ^= $pivot->{$_} or delete $row->{$_}
  73:         for keys %$pivot;
  74:   }
  75: }
  76: 
  77: # Interpret results
  78: if ($rank < $columns-1) {
  79:   # Matrix is underdetermined
  80:   print "Not enough information - guess more plaintext\n";
  81: }
  82: foreach my $row (@matrix[$rank .. $columns-1]) {
  83:   # The rows above $rank provide a check that the guessed
  84:   # plaintext is consistent
  85:   if ($row->{val}) {
  86:     print "Inconsistent plaintext given\n";
  87:     # Examining the "posN" entries would give information
  88:     # about which guesses were wrong
  89:     last;
  90:   }
  91: }
  92: # Even if the matrix is underdetermined or inconsistent,
  93: # decrypting with the inferred key values may help improve
  94: # the guesses
  95: my @key = (0) x $columns;
  96: foreach my $row (@matrix[0 .. $rank-1]) {
  97:   my $i;
  98:   for ($i = 0; $i < $columns; $i++) {
  99:     last if $row->{$i};
 100:   }
 101:   $key[$i] = $row->{val} || 0;
 102: }
 103: print "Key: @key\n";

Replies are listed 'Best First'.
No.
by boo_radley (Parson) on Aug 23, 2001 at 23:54 UTC
    --
    Double xor 'encryption' really just reduces down to single xor encryption, especially when done per-byte.
      True. Double-xor reduces to simple-xor, but with a key that's as long as the product (LCM, actually) of the original subkey lengths. Simple-xor is an unbreakable one-time pad if the key is as long as the message*. The technique above can often break a double-xor message with much less ciphertext than the LCM of the subkey lengths.

      * and if the key isn't reused for another message, and if the key isn't guessable for some other reason, yada yada.

Re: Breaking double-xor encryption
by John M. Dlugosz (Monsignor) on Aug 24, 2001 at 08:59 UTC
    RC4 is so simple, I don't see why anyone would use XOR or other toy code!
      Yeah. Any time you tell someone they shouldn't do something (like invent their own cryptosystem), they'll try to do it anyway. Problem with crypto is that they can't tell when they've failed. Unfortunately, RC4 has its problems, too, but it's still better than rolling your own.

      I have to come clean about something - I said I was attacking messages with 3*keylen ciphertext bytes, which is true but misleading. I don't have a really good general method for solving the upper bits of the keys from such a short message. I was doing that with about 7*keylen bytes, until I started focusing on specific weaknesses in CipherText for that step. I'm not telling everything here, because it only has a narrow applicablity, and I don't want Steeeeeve to plug up the problems with his cipher and come back with a new version.

      And because that part of my code is ugly. Ick.

        Yow.

        I don't even understand what he was talking about, re domain stuff!

        Sounds like he's back in the 15th century, with his point about "cat" producing different cyphertext each time it's (this one uses the appostrophe) encountered.

        —John