string mis-match?

zeltus has asked for the wisdom of the Perl Monks concerning the following question:

I have a perplexing problem whereby I expect to receive a password via a SOAP interface. When the password contains £ (or €) the is-password-correct test fails.

I have produced a hugely much-simplified example below and I hope someone can give me some pointers as to how to go about determining what the mismatch is, 'cos I can't see it. It doesn't seem to be utf-8 related but then, what do I know? Anyway, clues, tips on how to determine why these strings are different, gratefully received.

# $code received into system via a SOAP interface...

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envel
+ope/" xmlns:urn="urn:Whatever">
   <soapenv:Header>
      <urn:AuthHeader>
         <Customer>TopDog</Customer>
         <Username>Underdog</Username>
            <passwordString>S$ummerŁ!01</passwordString>
.
.
.

# perl example fragment

my $control = q(S$ummerŁ!01);

if ( $control eq $code ) { warn("+++ $name: strings DO match\n"); }
if ( $control lt $code ) { warn("+++ $name: control lt code\n"); }
if ( $control gt $code ) { warn("+++ $name: control gt code\n"); }

returns

+++ _password_matcher: control gt code
[download]

Comment on string mis-match? Download Code

Replies are listed 'Best First'.
Re: string mis-match? by Corion (Patriarch) on Aug 13, 2015 at 14:29 UTC
My guess is that the SOAP data comes in a different encoding than your stored password. You will need to be explicit in your encoding, that is, use `\N{...}` notation for Unicode characters, and you need to decode the data you receive from SOAP (if your SOAP module of choice hasn't done so already).	[reply] [d/l]
Re^2: string mis-match? by zeltus (Beadle) on Aug 13, 2015 at 14:57 UTC
My knowledge has taken me this far. But I don't know what encoding SOAP is using - as I saod, it works fine for "normal" characters, it's them pesky £ & € signs that are causing issues I have tried all I can think of to highlight whatever it is that "$str1 eq $str2" is carping about and I've run out of ideas. Hence my post :-(	[reply]
Re: string mis-match? by SuicideJunkie (Vicar) on Aug 13, 2015 at 14:24 UTC
I suggest printing both strings out with delimiters to see what they look like: `printf ']%s[\n', $string1; printf ']%s[\n', $string2;` [download] The ]['s will make whitespace in the strings noticeable. If you still don't see a difference, print the hex values of each byte in the string to see where it mismatches. And consider changing your system font if it turns out to be an I vs 1 thing or something similar.	[reply] [d/l]
Re^2: string mis-match? by zeltus (Beadle) on Aug 13, 2015 at 14:54 UTC
Yes, I do this sort of thing as a matter of course. The strings look identical. But the strimg comparison still declares a mis-match	[reply]
Re: string mis-match? by Tux (Canon) on Aug 13, 2015 at 15:53 UTC
They might look the same to you, but are they? $ perl -MData::Peek -CO -wE'$_="\xe2\x82\xac";say;DPeek$_' € PV("\342\202\254"\0) $ perl -MData::Peek -CO -wE'$_="\x{20ac}";say;DPeek$_' € PV("\342\202\254"\0) [UTF8 "\x{20ac}"] $ perl -MData::Peek -CO -wE'$_="€";say;DPeek$_' € PV("\342\202\254"\0) $ perl -MData::Peek -Mutf8 -CO -wE'$_="€";say;DPeek$_' € PV("\342\202\254"\0) [UTF8 "\x{20ac}"] $ perl -E'"\x{20ac}" eq "€" and say "EQ"' $ perl -E'"\xe2\x82\xac" eq "€" and say "EQ"' EQ $ perl -Mutf8 -E'"\x{20ac}" eq "€" and say "EQ"' EQ $ perl -MEncode=encode -E'encode ("utf-8", "\x{20ac}") eq "€" and say "EQ"' EQ I suggest you use Encode's `encode` to guarantee valid utf-8'ness Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re: string mis-match? by johngg (Canon) on Aug 13, 2015 at 15:08 UTC
`if ( $control eq $code ) { warn("+++ $name: strings DO match\n"); } if ( $control lt $code ) { warn("+++ $name: control lt code\n"); } if ( $control gt $code ) { warn("+++ $name: control gt code\n"); }` [download] You can shortcut this code by putting your messages into an array in the order equal, more, less then subscripting by the comparison using `cmp` rather than three separate tests. `$ perl -Mstrict -Mwarnings -e ' my $code = q{M}; my @msgs = qw{ equal more less }; warn qq{$_ }, $msgs[ $_ cmp $code ], qq{ $code\n} for qw{ A M Z };' A less M M equal M Z more M $` [download] I hope this is of interest. Cheers, JohnGG	[reply] [d/l] [select]
Re: string mis-match? by zeltus (Beadle) on Aug 13, 2015 at 16:15 UTC
OK, I've made some progress... it seems perl is doing what I expect it to i.e use utf-8 when I populate a strin... So a £ sign is represented by x'c2a3 (U+00A3), a 2-byte code. But SOAP is sending in a single-byte x'a3 - which is the ASCII code for a £ My question has therefore evolved into "How do I tell SOAP to send utf-8 to me OR how do I use perl to convert to utf-8?" (I have tried using Encode's 'decode_utf8' on the string received from SOAP but it didn't make any differnece that I could see...) well, it's progress Jim, but not as we know it.	[reply]
Re^2: string mis-match? by afoken (Chancellor) on Aug 13, 2015 at 18:07 UTC
But SOAP is sending in a single-byte x'a3 - which is the ASCII code for a Ł ASCII codes end at 0x7F. Everything beyond that is something else. ISO-8859-1 and its superset Windows-1252 use 0xA3 to represent Ł. So maybe SOAP is sending data encoded in ISO-8859-1 or Windows-1252. Maybe it is a different encoding that also uses 0xA3 to represent Ł. https://en.wikipedia.org/wiki/ISO/IEC_8859 documents that 0xA3 represents Ł in ISO-8859-1, -3, -7, -8, -9, -13, -14, and -15. I have tried using Encode's 'decode_utf8' on the string received from SOAP How do you expect that to work? Did you actually read the Encode documentation? `decode_utf8()` expects a UTF-8 encoded bytestream and returns a perl string. You feed it a bytestream that is encoded in ISO-8859-1, Windows-1252, or somethings else, but not UTF-8. You need a function that can decode the encoding used for the bytestream from SOAP into a perl string. That function is called `decode()` and unlike `decode_utf8()`, it also expects an encoding name. See Encode for details. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]