Gorby has asked for the wisdom of the Perl Monks concerning the following question:

Hello Wise Monks! I encountered this message on the command line after I ran my perl program:

Malformed UTF-8 character (unexpected non-continuation byte 0x4f, immediately after start byte 0xd1) in uc at tvucp.cgi line 1970.

What does this mean?

Thanks in advance.

Gorby

Replies are listed 'Best First'.
Re: What does UTF-8 mean?
by iburrell (Chaplain) on May 17, 2004 at 16:12 UTC
    That error messages means a string that Perl thinks is in Unicode is not valid UTF-8. This is probably because it is not UTF-8 but some other character encoding.

    To fix this, we need to see at least that area of code. Also, we need to know where the string is coming from. And if the script is using 'use utf8' to interpret the program source as Unicode.

    To answer the subject, UTF-8 is a character encoding for representing Unicode characters. Perl uses UTF-8 as the internal representation of Unicode strings. In Perl, strings can be marked as Unicode or byte strings.

Re: What does UTF-8 mean?
by halley (Prior) on May 17, 2004 at 17:23 UTC
    A more literal answer is that the program expected "Unicode 8" encoding but got something else instead. A primer on character encoding I wrote earlier... FMTYEWTK about Characters vs Bytes

    --
    [ e d @ h a l l e y . c c ]

Re: What does UTF-8 mean?
by pbeckingham (Parson) on May 17, 2004 at 16:03 UTC

    It means that you need to include in your question some sample code, (line 1970 in tvucp.cgi for a start), and any relevant encoding manipulations that you are using.

    Then we'll tell you why uc is complaining.

Re: What does UTF-8 mean?
by grantm (Parson) on May 24, 2004 at 02:00 UTC