There are of course several ways to do it... but one way would be to convert the non-UTF-8 string to UTF-8, so that you can compare them. (Note that I'm assuming your non-UTF-8 string is literally 'st\x{f9}', not what it would be if you had it written like "st\x{f9}" in some Perl code...)

This should do the job:

my $x = 'st\x{f9}'; $x =~ s/\\x\{([\da-fA-F]{2,4})\}/pack("U",hex($1))/ge;

Then, you can do if ($x eq $u) ..., presuming that $u is your unicode string (with utf8 flag on!).

Update: another way would be:

use Encode; my $x = 'st\x{f9}'; my $iso = eval '"'.$x.'"'; $x = Encode::decode('iso-8859-1', $iso);

but I would only use eval on an arbitrary string, if I can be absolutely sure it doesn't contain any malicious stuff...

Note that the decode() is required if your \x{...} values are smaller than 0x100, in which case they will not be unicode after the eval.

Update 2: actually, my latter statement is not 100% correct (i.e., the decode is not strictly required here... but it doesn't do any harm either). Reason is that in a number of cases (like the one here with char 0xf9), Perl would do an automagic upgrade of the isolatin string. In other words, when comparing an isolatin (byte) string with the corresponding real unicode string, you would in fact get the (naively) expected result that they're equal...


In reply to Re: string comparison with escape sequences by almut
in thread string comparison with escape sequences by danmcb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.