in reply to Perl's encoding versus UTF8 octets

The question is how \xc3\xa4 is actually represented in your file.

If you write qq|\xc3\xa4|, then Perl interprets the eight characters as two bytes with the hexadecimal values of c3 and a4, respectively. But if you read \xc3\xa4 from a file, this interpretation doesn't take place: These are eight individual ASCII characters. What you can do, of course, is do the interpretation yourself:

use Encode; my $ucode = q/\xc3\xa4/; # note the use of 'q', not 'qq' my $newcode = decode('utf8',$ucode =~ s/\\x([a-fA-F0-9]{2})/chr hex($1 +)/egr);

Replies are listed 'Best First'.
Re^2: Perl's encoding versus UTF8 octets
by Polyglot (Chaplain) on Jan 13, 2021 at 07:06 UTC

    Yes, it appears you have the same solution that GrandFather posted, but have managed to crystallize it into a one-liner. Thank you. I'm sure future readers will appreciate this additional clarity.

    NOTE: Yes, the file does have those literal characters, and they were being processed as eight ascii characters by Perl.

    Blessings,

    ~Polyglot~