in reply to string comparison with escape sequences
There are of course several ways to do it... but one way would be to convert the non-UTF-8 string to UTF-8, so that you can compare them. (Note that I'm assuming your non-UTF-8 string is literally 'st\x{f9}', not what it would be if you had it written like "st\x{f9}" in some Perl code...)
This should do the job:
my $x = 'st\x{f9}'; $x =~ s/\\x\{([\da-fA-F]{2,4})\}/pack("U",hex($1))/ge;
Then, you can do if ($x eq $u) ..., presuming that $u is your unicode string (with utf8 flag on!).
Update: another way would be:
use Encode; my $x = 'st\x{f9}'; my $iso = eval '"'.$x.'"'; $x = Encode::decode('iso-8859-1', $iso);
but I would only use eval on an arbitrary string, if I can be absolutely sure it doesn't contain any malicious stuff...
Note that the decode() is required if your \x{...} values are smaller than 0x100, in which case they will not be unicode after the eval.
Update 2: actually, my latter statement is not 100% correct (i.e., the decode is not strictly required here... but it doesn't do any harm either). Reason is that in a number of cases (like the one here with char 0xf9), Perl would do an automagic upgrade of the isolatin string. In other words, when comparing an isolatin (byte) string with the corresponding real unicode string, you would in fact get the (naively) expected result that they're equal...
|
|---|