in reply to UTF-8 problem (malformed...)
>perl -MEncode -e"print encode 'UTF-8', qq{use utf8; my \$x='a\xE1axBy +'; \$x =~ /(.*?)x(A*)y/; }" | perl Malformed UTF-8 character (unexpected continuation byte 0xa1, with no +preceding start byte) in pattern match (m//) at - line 1.
huh? That's not right.
Other tools show that the following is produced.
61 C3 A1 61 78 42 79
C3 is indeed the valid start of a two-byte sequence, A1 is indeed a continuation byte, and C3 A1 is indeed the encoding of LATIN SMALL LETTER A WITH ACUTE (U+00E1). It's a bug.
I believe it's a known bug. It might be fixed in 5.12. You can work around it by putting the following at the start of the pattern:
I don't know if there is a better workaround(?!)\x{2660}|
I need the two "use"-s, because otherwise I wasn't able to print out something like "αινσ" to terminal.
What you want is
oruse utf8; use open ':std', ':locale';
use utf8; use open ':std', ':encoding(cp852)';
Unforunatley, the first doesn't work on Windows :(
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: UTF-8 problem (malformed...)
by Anonymous Monk on Dec 27, 2009 at 23:40 UTC |