in reply to Re: perplexing inconsistency using RC4 and unpack
in thread perplexing inconsistency using RC4 and unpack

difficult to show a small sample really.. on further investigation it seems unpack is really not doing what I'd expect it to: the message substr: 637573746F6D65726E616D6528BF4E5E4E758A4164004E56FFFA01082E2E00B6 is correctly unchanged all the way until we unpack it:
my $ms = $ml < $MAX_CHUNK_SIZE ? $ml : $MAX_CHUNK_SIZE; for my $piece ( 0..$num_pieces - 1 ) { my $ss = substr($message, $piece * $MAX_CHUNK_SIZE, $ms); # fine + up to here... my $ssl = length $ss; my @message = unpack( "C*", $ss ); ### at this point the charact +ers are changed... we go from 32 to 37 characters (in the array). why +? # i've changed the code in the RC4 package here slightly to allow +testing. # it produces identical results in all cases considered here.
then it becomes (pack'ing it again and converting to hex for display):
637573746F6D65726E616D6528C2BF4E5E4E75C28A4164004E56C3BFC3BA01082E2E00C2B6
looking at these 2 strings you can see some characters are being inserted:
637573746F6D65726E616D6528 BF4E5E4E75 8A4164004E56 FFF A0108 2E2E00B6 #correct
637573746F6D65726E616D6528 C2 BF4E5E4E75 C2 8A4164004E56 C3BFC3B A0108 2E2E00 C2 B6 #faulty

so unpack is somehow inserting those C2 (Â in ascii) characters and changing the FFF (12 bits) to C3BFC3B

what could be making unpack behave in this way?

Replies are listed 'Best First'.
Re^3: perplexing inconsistency using RC4 and unpack (UTF-8)
by tye (Sage) on Aug 01, 2009 at 05:44 UTC

    You are suffering from UTF-8 expansion:

    #!/usr/bin/perl -wl print unpack "U0H*", "\x{BF}"; print unpack "U0H*", "\x{FF}"; print unpack "U0H*", "\x{FA}"; __END__ c2bf c3bf c3ba

    Now you just need to figure out where the UTF-8 expansion is sneaking in.

    - tye        

      thank you tye, that was extremely helpful. seems I had an issue with a blank string being converted to a hash somewhere in an XML parsing phase, which was later cast to a string (which seemed to make perl guess it was a UTF8 one). several concatenations later what looked like a nice plan ascii string, actually wasn't. the solution? specify my input more rigorously:
      if($options->{'customerpass'} && (ref $options->{'customerpass'} ne +"HASH")){ $options->{'customerpass'} = encode("iso-8859-1", $options->{'cust +omerpass'}); }else{ $options->{'customerpass'} = ''; }
      leaves nothing to guess work and fixed my problem. (this call to encode ( use Encode; ) basically says (I think), 'whatever it looks like, this string is latin1 ascii, end of story.' no more utf8 expansion!)
      thanks for the help!
Re^3: perplexing inconsistency using RC4 and unpack
by Anonymous Monk on Aug 01, 2009 at 03:37 UTC
    I don't really follow , but why not try unpack "H*" without "C*" step?