woosley has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I have used to write document using POD, but if I type some Chinese in the POD file, then perldoc command can not display it correctly.
=encoding utf8 =head1 NAME 测试perldoc
run perldoc test.pod will give you
TEST(1) User Contributed Perl Documentation + TEST(1) NAME XXperldoc perl v5.10.0 2009-12-30
All the unicode is replaced by a 'X'. How can I fix this?
Man, perlmonk dose not support unicode either?

Replies are listed 'Best First'.
Re: Can perldoc support unicode?
by ikegami (Patriarch) on Dec 30, 2009 at 14:22 UTC

    Man, perlmonk dose not support unicode either

    Yes and no. The form contents must be encoded using iso-8859-1, but since PerlMonks accepts HTML, you can use entities (& sequences) to represent any unicode character.

    You tried to post characters outside of iso-8859-1, so your browser submitted than as entities. The problem is that those entities were inside code tags, so they were presumed to be text, not HTML.

    测试

    As for perldoc, perldoc -t worked... kinda. It issued a warning and it will only work if you have a UTF-8 terminal.

    I don't know if the failure to work without -t is a bug, but it sounds like it from the earlier reply to your post. There's no excuse for -t's behaviour, though. A bug report is in order.

Re: Can perldoc support unicode?
by almut (Canon) on Dec 30, 2009 at 13:40 UTC

    The issue seems to be specific to the man page formatter (perldoc -o text ... for example (i.e. Pod::Text) appears to work fine for me).  From Pod::Man:

    utf8

    By default, Pod::Man produces the most conservative possible *roff output to try to ensure that it will work with as many different *roff implementations as possible. Many *roff implementations cannot handle non-ASCII characters, so this means all non-ASCII characters are converted either to a *roff escape sequence that tries to create a properly accented character (at least for troff output) or to X.

    If this option is set, Pod::Man will instead output UTF-8. If your *roff implementation can handle it, this is the best output format to use and avoids corruption of documents containing non-ASCII characters. However, be warned that *roff source with literal UTF-8 characters is not supported by many implementations and may even result in segfaults and other bad behavior.

    So I guess the question becomes how to tell perldoc to pass the utf8 option to the Pod::Man constructor...