zeltus has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I have a perplexing problem whereby I expect to receive a password via a SOAP interface. When the password contains £ (or €) the is-password-correct test fails.

I have produced a hugely much-simplified example below and I hope someone can give me some pointers as to how to go about determining what the mismatch is, 'cos I can't see it. It doesn't seem to be utf-8 related but then, what do I know? Anyway, clues, tips on how to determine why these strings are different, gratefully received.

# $code received into system via a SOAP interface... <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envel +ope/" xmlns:urn="urn:Whatever"> <soapenv:Header> <urn:AuthHeader> <Customer>TopDog</Customer> <Username>Underdog</Username> <passwordString>S$ummer£!01</passwordString> . . . # perl example fragment my $control = q(S$ummer£!01); if ( $control eq $code ) { warn("+++ $name: strings DO match\n"); } if ( $control lt $code ) { warn("+++ $name: control lt code\n"); } if ( $control gt $code ) { warn("+++ $name: control gt code\n"); } returns +++ _password_matcher: control gt code

Replies are listed 'Best First'.
Re: string mis-match?
by Corion (Patriarch) on Aug 13, 2015 at 14:29 UTC

    My guess is that the SOAP data comes in a different encoding than your stored password. You will need to be explicit in your encoding, that is, use \N{...} notation for Unicode characters, and you need to decode the data you receive from SOAP (if your SOAP module of choice hasn't done so already).

      My knowledge has taken me this far. But I don't know what encoding SOAP is using - as I saod, it works fine for "normal" characters, it's them pesky £ & € signs that are causing issues

      I have tried all I can think of to highlight whatever it is that "$str1 eq $str2" is carping about and I've run out of ideas. Hence my post :-(

Re: string mis-match?
by SuicideJunkie (Vicar) on Aug 13, 2015 at 14:24 UTC

    I suggest printing both strings out with delimiters to see what they look like:

    printf ']%s[\n', $string1; printf ']%s[\n', $string2;

    The ]['s will make whitespace in the strings noticeable.

    If you still don't see a difference, print the hex values of each byte in the string to see where it mismatches. And consider changing your system font if it turns out to be an I vs 1 thing or something similar.

      Yes, I do this sort of thing as a matter of course. The strings look identical. But the strimg comparison still declares a mis-match

Re: string mis-match?
by Tux (Canon) on Aug 13, 2015 at 15:53 UTC

    They might look the same to you, but are they?

    $ perl -MData::Peek -CO -wE'$_="\xe2\x82\xac";say;DPeek$_'
    €
    PV("\342\202\254"\0)
    $ perl -MData::Peek -CO -wE'$_="\x{20ac}";say;DPeek$_'
    €
    PV("\342\202\254"\0) [UTF8 "\x{20ac}"]
    $ perl -MData::Peek -CO -wE'$_="€";say;DPeek$_'
    €
    PV("\342\202\254"\0)
    $ perl -MData::Peek -Mutf8 -CO -wE'$_="€";say;DPeek$_'
    €
    PV("\342\202\254"\0) [UTF8 "\x{20ac}"]
    
    
    $ perl -E'"\x{20ac}" eq "€" and say "EQ"'
    $ perl -E'"\xe2\x82\xac" eq "€" and say "EQ"'
    EQ
    $ perl -Mutf8 -E'"\x{20ac}" eq "€" and say "EQ"'
    EQ
    $ perl -MEncode=encode -E'encode ("utf-8", "\x{20ac}") eq "€" and say "EQ"'
    EQ
    

    I suggest you use Encode's encode to guarantee valid utf-8'ness


    Enjoy, Have FUN! H.Merijn
Re: string mis-match?
by johngg (Canon) on Aug 13, 2015 at 15:08 UTC
    if ( $control eq $code ) { warn("+++ $name: strings DO match\n"); } if ( $control lt $code ) { warn("+++ $name: control lt code\n"); } if ( $control gt $code ) { warn("+++ $name: control gt code\n"); }

    You can shortcut this code by putting your messages into an array in the order equal, more, less then subscripting by the comparison using cmp rather than three separate tests.

    $ perl -Mstrict -Mwarnings -e ' my $code = q{M}; my @msgs = qw{ equal more less }; warn qq{$_ }, $msgs[ $_ cmp $code ], qq{ $code\n} for qw{ A M Z };' A less M M equal M Z more M $

    I hope this is of interest.

    Cheers,

    JohnGG

Re: string mis-match?
by zeltus (Beadle) on Aug 13, 2015 at 16:15 UTC

    OK, I've made some progress... it seems perl is doing what I expect it to i.e use utf-8 when I populate a strin...

    So a £ sign is represented by x'c2a3 (U+00A3), a 2-byte code.

    But SOAP is sending in a single-byte x'a3 - which is the ASCII code for a £

    My question has therefore evolved into "How do I tell SOAP to send utf-8 to me OR how do I use perl to convert to utf-8?"

    (I have tried using Encode's 'decode_utf8' on the string received from SOAP but it didn't make any differnece that I could see...)

    well, it's progress Jim, but not as we know it.

      But SOAP is sending in a single-byte x'a3 - which is the ASCII code for a £

      ASCII codes end at 0x7F. Everything beyond that is something else. ISO-8859-1 and its superset Windows-1252 use 0xA3 to represent £. So maybe SOAP is sending data encoded in ISO-8859-1 or Windows-1252. Maybe it is a different encoding that also uses 0xA3 to represent £. https://en.wikipedia.org/wiki/ISO/IEC_8859 documents that 0xA3 represents £ in ISO-8859-1, -3, -7, -8, -9, -13, -14, and -15.

      I have tried using Encode's 'decode_utf8' on the string received from SOAP

      How do you expect that to work? Did you actually read the Encode documentation? decode_utf8() expects a UTF-8 encoded bytestream and returns a perl string. You feed it a bytestream that is encoded in ISO-8859-1, Windows-1252, or somethings else, but not UTF-8. You need a function that can decode the encoding used for the bytestream from SOAP into a perl string. That function is called decode() and unlike decode_utf8(), it also expects an encoding name. See Encode for details.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)