adamvagyok has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have the following code (saved in UTF-8 format, using Notepad++):

#!/usr/bin/perl use strict; use warnings; use encoding "cp852"; use utf8; my $a="aáaxBy"; $a =~ /(.*?)x(A*)y/;
But it says the following error message:
Malformed UTF-8 character (unexpected continuation byte 0xa1, with no +preceding start byte) in pattern match (m//) at c:\Perl\work\gi\utfte +st.pl line 9.

I just don't understand why and how to get rid of it. How can a malformed string be in a UTF-8 saved text file?

Thanks for your help in advance,

Adam

P.S. I need the two "use"-s, because otherwise I wasn't able to print out something like "áéíó" to terminal.

Replies are listed 'Best First'.
Re: UTF-8 problem (malformed...)
by ikegami (Patriarch) on Dec 27, 2009 at 23:37 UTC
    >perl -MEncode -e"print encode 'UTF-8', qq{use utf8; my \$x='a\xE1axBy +'; \$x =~ /(.*?)x(A*)y/; }" | perl Malformed UTF-8 character (unexpected continuation byte 0xa1, with no +preceding start byte) in pattern match (m//) at - line 1.

    huh? That's not right.

    Other tools show that the following is produced.

    61 C3 A1 61 78 42 79

    C3 is indeed the valid start of a two-byte sequence, A1 is indeed a continuation byte, and C3 A1 is indeed the encoding of LATIN SMALL LETTER A WITH ACUTE (U+00E1). It's a bug.

    I believe it's a known bug. It might be fixed in 5.12. You can work around it by putting the following at the start of the pattern:

    (?!)\x{2660}|
    I don't know if there is a better workaround

    I need the two "use"-s, because otherwise I wasn't able to print out something like "áéíó" to terminal.

    What you want is

    use utf8; use open ':std', ':locale';
    or
    use utf8; use open ':std', ':encoding(cp852)';

    Unforunatley, the first doesn't work on Windows :(

      You could no warnings 'utf8';
Re: UTF-8 problem (malformed...)
by Anonymous Monk on Dec 27, 2009 at 22:57 UTC
    use encoding "cp852"; use utf8;
    First you're telling perl your program is writen using cp852, and next you're saying its utf8. Try only
    use encoding 'utf8';
      Theoretically, you're right. But unfortunately it doesn't work. Gives the same error, and additionally, unable to print "áéíó" to terminal.