oddmedley has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I'm doing things with Crypt::RC4 (via a modified PDF::API2 library) and have found the same input doesn't always produce the same output, as it should. I'm currently at a loss figuring out why and would really like some help. what's happening is when it's called from a) my webpage, the encrypted output is blowing up to many times the size of the input after several iterations. when I call it from b) a cgi test script, apparently with identical arguments, it behaves nicely and does what I think it should. In both cases the outputs are repeatable. diving into the RC4 package I found when I have a message string of, say, 32 bytes, in case a) the line:
my @message = unpack "C*", substr($message, $piece * $MAX_CHUNK_SI +ZE, $MAX_CHUNK_SIZE);
in the RC4 sub, produces an array of 37 characters! even when I alter the substr call and set the last argument to 32, still, the size of @message is 37... when I feed the output of the RC4 back into itself (as in the algorithm for producing the O val for a PDF file encryption dictionary, revision 3), the size of @message blows up to several kb, when it should stay at 32 bytes... I've checked the arguments to the RC4 sub in cases a) and b) and they are the same, and yet the output is different. does anyone know of any reason this could happen? if it helps the inputs to the RC4 function are (converted to hex so you can read them, the actual function gets the ascii strings):
RC4(36A756FC8BA497CA34532CA4A1086AD0, 637573746F6D65726E616D6528BF4E5E +4E758A4164004E56FFFA01082E2E00B6)
the first iteration output (in hex) in case a) is: DFB79756C2D1B49F540DC80A3C39290C6F6B61539635BE03CC2E82990C6A974503ABBC84A2 and in case b): DFB79756C2D1B49F540DC80A3C44D81C7F509ED0787494557D82402EE1FE96FB
you can see they are the same up to character 13 (26th character in the hex representation). it seems like somehow in case a) there's some strange characters getting into the end of the string that in effect are saying to unpack, 'hey, keep reading characters, go on!'. any clues? let me know if you need any more info. thanks,

Replies are listed 'Best First'.
Re: perplexing inconsistency using RC4 and unpack
by Anonymous Monk on Aug 01, 2009 at 03:00 UTC
    Please show code a small code sample that demonstrates the problem.
      difficult to show a small sample really.. on further investigation it seems unpack is really not doing what I'd expect it to: the message substr: 637573746F6D65726E616D6528BF4E5E4E758A4164004E56FFFA01082E2E00B6 is correctly unchanged all the way until we unpack it:
      my $ms = $ml < $MAX_CHUNK_SIZE ? $ml : $MAX_CHUNK_SIZE; for my $piece ( 0..$num_pieces - 1 ) { my $ss = substr($message, $piece * $MAX_CHUNK_SIZE, $ms); # fine + up to here... my $ssl = length $ss; my @message = unpack( "C*", $ss ); ### at this point the charact +ers are changed... we go from 32 to 37 characters (in the array). why +? # i've changed the code in the RC4 package here slightly to allow +testing. # it produces identical results in all cases considered here.
      then it becomes (pack'ing it again and converting to hex for display):
      637573746F6D65726E616D6528C2BF4E5E4E75C28A4164004E56C3BFC3BA01082E2E00C2B6
      looking at these 2 strings you can see some characters are being inserted:
      637573746F6D65726E616D6528 BF4E5E4E75 8A4164004E56 FFF A0108 2E2E00B6 #correct
      637573746F6D65726E616D6528 C2 BF4E5E4E75 C2 8A4164004E56 C3BFC3B A0108 2E2E00 C2 B6 #faulty

      so unpack is somehow inserting those C2 (Â in ascii) characters and changing the FFF (12 bits) to C3BFC3B

      what could be making unpack behave in this way?

        You are suffering from UTF-8 expansion:

        #!/usr/bin/perl -wl print unpack "U0H*", "\x{BF}"; print unpack "U0H*", "\x{FF}"; print unpack "U0H*", "\x{FA}"; __END__ c2bf c3bf c3ba

        Now you just need to figure out where the UTF-8 expansion is sneaking in.

        - tye        

        I don't really follow , but why not try unpack "H*" without "C*" step?