#!/usr/local/bin/perl use 5.016; use utf8; use strict; use warnings; binmode(STDIN, ':encoding(utf-8)'); binmode(STDOUT, ':encoding(utf-8)'); binmode(STDERR, ':encoding(utf-8)'); my $string = qq[a Å]; my $fh = IO::File->new(); $fh->open(\$string, '<:encoding(UTF-8)'); say $fh->getc(); # a say $fh->getc(); # SPACE say $fh->getc(); # Å LATIN CAPITAL LETTER A WITH RING ABOVE (U+00C5) $fh->ungetc(ord("Å")); say $fh->getc(); # should be A RING again.
The error message from the ungetc() line is "Malformed UTF-8 character (unexpected end of string) in say at unicode.pl line 21. "\x{00c5}" does not map to utf8 at unicode.pl line 21." But that's the correct hex for the character, and it should map to the character.
I used a hex editor to make sure that the bytes for A-RING are correct for UTF-8.
This seems to be a problem for any two-byte character.
The final say outputs '\xC5' (literally four characters: backslash, x, C, 5)
And I've tested this by reading from files instead of scalar variables. The result is the same.
This is perl 5, version 16, subversion 2 (v5.16.2) built for darwin-2level
Edited to add: And the script is saved in UTF-8. That was the first thing I checked.
In reply to IO::Handle Unicode and ungetc() by coolmichael
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |