in reply to UTF-8 problem (malformed...)

>perl -MEncode -e"print encode 'UTF-8', qq{use utf8; my \$x='a\xE1axBy +'; \$x =~ /(.*?)x(A*)y/; }" | perl Malformed UTF-8 character (unexpected continuation byte 0xa1, with no +preceding start byte) in pattern match (m//) at - line 1.

huh? That's not right.

Other tools show that the following is produced.

61 C3 A1 61 78 42 79

C3 is indeed the valid start of a two-byte sequence, A1 is indeed a continuation byte, and C3 A1 is indeed the encoding of LATIN SMALL LETTER A WITH ACUTE (U+00E1). It's a bug.

I believe it's a known bug. It might be fixed in 5.12. You can work around it by putting the following at the start of the pattern:

(?!)\x{2660}|
I don't know if there is a better workaround

I need the two "use"-s, because otherwise I wasn't able to print out something like "αινσ" to terminal.

What you want is

use utf8; use open ':std', ':locale';
or
use utf8; use open ':std', ':encoding(cp852)';

Unforunatley, the first doesn't work on Windows :(

Replies are listed 'Best First'.
Re^2: UTF-8 problem (malformed...)
by Anonymous Monk on Dec 27, 2009 at 23:40 UTC
    You could no warnings 'utf8';