Re: Perl's encoding versus UTF8 octets

The question is how \xc3\xa4 is actually represented in your file.

If you write qq|\xc3\xa4|, then Perl interprets the eight characters as two bytes with the hexadecimal values of c3 and a4, respectively. But if you read \xc3\xa4 from a file, this interpretation doesn't take place: These are eight individual ASCII characters. What you can do, of course, is do the interpretation yourself:

use Encode;
my $ucode = q/\xc3\xa4/; # note the use of 'q', not 'qq'
my $newcode = decode('utf8',$ucode =~ s/\\x([a-fA-F0-9]{2})/chr hex($1
+)/egr);
[download]

Comment on Re: Perl's encoding versus UTF8 octets Download Code

Replies are listed 'Best First'.
Re^2: Perl's encoding versus UTF8 octets by Polyglot (Chaplain) on Jan 13, 2021 at 07:06 UTC
Yes, it appears you have the same solution that GrandFather posted, but have managed to crystallize it into a one-liner. Thank you. I'm sure future readers will appreciate this additional clarity. NOTE: Yes, the file does have those literal characters, and they were being processed as eight ascii characters by Perl. Blessings, ~Polyglot~	[reply]