djihed has asked for the wisdom of the Perl Monks concerning the following question:

I have some ascii strings output by a different program and written in a file. They contain utf-8 byte representation that I need to convert to utf8. For example, how to convert:
my $a = "\\xC4\\x80"; # 8 ascii characters
to
"\xC4\x80" # 2 code points representing a utf character.
A simple search and replace won't do, by the way.

Replies are listed 'Best First'.
Re: Converting string literal represetation of utf to utf8
by dave_the_m (Monsignor) on Jun 17, 2011 at 10:56 UTC
    Is this what you want:
    my $a = "\\xC4\\x80"; $a =~ s/\\x([0-9A-Fa-f]+)/chr(hex($1))/ge; utf8::decode($a); printf "length=%d, ord=0x%x\n", length($a), ord($a);
    Outputs:
    length=1, ord=0x100

    Dave.

      Yes, thanks.
Re: Converting string literal represetation of utf to utf8
by moritz (Cardinal) on Jun 17, 2011 at 10:18 UTC
      What didn't work:
      my $a = "\\xC4\\x80"; utf8::decode($a); # does not work, \\x are literal ascii. # is still the same. $a =~ s/\\x(\w\w)/\x$1/g; # Does not work. \x is interpreted withou +t the code points.