Re: UTF-8 problem (malformed...)

>perl -MEncode -e"print encode 'UTF-8', qq{use utf8; my \$x='a\xE1axBy
+'; \$x =~ /(.*?)x(A*)y/; }" | perl
Malformed UTF-8 character (unexpected continuation byte 0xa1, with no 
+preceding start byte) in pattern match (m//) at - line 1.
[download]

huh? That's not right.

Other tools show that the following is produced.

61 C3 A1 61 78 42 79
[download]

C3 is indeed the valid start of a two-byte sequence, A1 is indeed a continuation byte, and C3 A1 is indeed the encoding of LATIN SMALL LETTER A WITH ACUTE (U+00E1). It's a bug.

I believe it's a known bug. It might be fixed in 5.12. You can work around it by putting the following at the start of the pattern:

(?!)\x{2660}|
[download]

I don't know if there is a better workaround

I need the two "use"-s, because otherwise I wasn't able to print out something like "áéíó" to terminal.

What you want is

use utf8;
use open ':std', ':locale';
[download]

use utf8;
use open ':std', ':encoding(cp852)';
[download]

Unforunatley, the first doesn't work on Windows :(

Comment on Re: UTF-8 problem (malformed...) Select or Download Code

Replies are listed 'Best First'.
Re^2: UTF-8 problem (malformed...) by Anonymous Monk on Dec 27, 2009 at 23:40 UTC
You could `no warnings 'utf8';`	[reply] [d/l]