in reply to perplexing inconsistency using RC4 and unpack

Please show code a small code sample that demonstrates the problem.
  • Comment on Re: perplexing inconsistency using RC4 and unpack

Replies are listed 'Best First'.
Re^2: perplexing inconsistency using RC4 and unpack
by oddmedley (Novice) on Aug 01, 2009 at 03:27 UTC
    difficult to show a small sample really.. on further investigation it seems unpack is really not doing what I'd expect it to: the message substr: 637573746F6D65726E616D6528BF4E5E4E758A4164004E56FFFA01082E2E00B6 is correctly unchanged all the way until we unpack it:
    my $ms = $ml < $MAX_CHUNK_SIZE ? $ml : $MAX_CHUNK_SIZE; for my $piece ( 0..$num_pieces - 1 ) { my $ss = substr($message, $piece * $MAX_CHUNK_SIZE, $ms); # fine + up to here... my $ssl = length $ss; my @message = unpack( "C*", $ss ); ### at this point the charact +ers are changed... we go from 32 to 37 characters (in the array). why +? # i've changed the code in the RC4 package here slightly to allow +testing. # it produces identical results in all cases considered here.
    then it becomes (pack'ing it again and converting to hex for display):
    637573746F6D65726E616D6528C2BF4E5E4E75C28A4164004E56C3BFC3BA01082E2E00C2B6
    looking at these 2 strings you can see some characters are being inserted:
    637573746F6D65726E616D6528 BF4E5E4E75 8A4164004E56 FFF A0108 2E2E00B6 #correct
    637573746F6D65726E616D6528 C2 BF4E5E4E75 C2 8A4164004E56 C3BFC3B A0108 2E2E00 C2 B6 #faulty

    so unpack is somehow inserting those C2 (Â in ascii) characters and changing the FFF (12 bits) to C3BFC3B

    what could be making unpack behave in this way?

      You are suffering from UTF-8 expansion:

      #!/usr/bin/perl -wl print unpack "U0H*", "\x{BF}"; print unpack "U0H*", "\x{FF}"; print unpack "U0H*", "\x{FA}"; __END__ c2bf c3bf c3ba

      Now you just need to figure out where the UTF-8 expansion is sneaking in.

      - tye        

        thank you tye, that was extremely helpful. seems I had an issue with a blank string being converted to a hash somewhere in an XML parsing phase, which was later cast to a string (which seemed to make perl guess it was a UTF8 one). several concatenations later what looked like a nice plan ascii string, actually wasn't. the solution? specify my input more rigorously:
        if($options->{'customerpass'} && (ref $options->{'customerpass'} ne +"HASH")){ $options->{'customerpass'} = encode("iso-8859-1", $options->{'cust +omerpass'}); }else{ $options->{'customerpass'} = ''; }
        leaves nothing to guess work and fixed my problem. (this call to encode ( use Encode; ) basically says (I think), 'whatever it looks like, this string is latin1 ascii, end of story.' no more utf8 expansion!)
        thanks for the help!
      I don't really follow , but why not try unpack "H*" without "C*" step?