Tanktalus has asked for the wisdom of the Perl Monks concerning the following question:

Has anyone any experience with Text::Wrap and wrapping languages other than Roman-based languages? I'm sure it works fine for English (where the definition of "fine" may vary from person to person), and probably fine for other Roman-letter-based languages, such as French or Italian or Portuguese, etc.. However, I'm more curious about anyone having experience with it in other languages, such as Russian, Hebrew, Arabic, Hindi, and, what seems to me to be the killer, Asian languages such as Chinese (Simplified or Traditional), Korean, Japanese. (Ok, I realise that Hindi is an Asian language, but I'm not sure whether its glyphs are single-sounded, like Roman glyphs, or are syllabic, like Chinese.)

What would seem useful to me is a wrapper around Locale::Maketext that would automatically wrap the text based on the current screen width (to be determined separately). This would allow the developer (and translator) to ignore line lengths in most circumstances (especially for corner cases where the text, after variable interpolation, would end up near the normal 80-character limit).

If you have experience with other formatting modules, such as Text::Autoformat (is that even a Damian-approved Damian module?), with other languages, I'd like to hear those, too. Since I don't actually have any Chinese text to play with, and even if I did, I wouldn't be able to tell if the formatting made sense or not, I'm looking for some broader experience.

Thanks,

Replies are listed 'Best First'.
Re: Text::Wrap and non-Roman languages
by zwon (Abbot) on Jan 08, 2010 at 22:40 UTC

    The problem with Chinese is that its characters have a double width, Text::Wrap doesn't takes this into account. Belarusian text is wrapped correctly.

    use strict;
    use warnings;
    use 5.010;
    use utf8;
    use open ':std', ':utf8';
    use Text::Wrap 'wrap';
    
    $Text::Wrap::columns=13;
    say wrap('', '', "1778 Джэймс Кук адкрыў Сэндвічавы (Гавайскія) астравы");
    say wrap('', '', "中国湖南省湘潭县谭家山镇立胜煤矿发生一起井下电缆起火事故,造成至少25名矿工遇难");
    __END__
    1778 Джэймс
    Кук адкрыў
    Сэндвічавы
    (Гавайскія)
    астравы
    中国湖南省湘潭县谭家山镇
    立胜煤矿发生一起井下电缆
    起火事故,造成至少25名
    矿工遇难
    
Re: Text::Wrap and non-Roman languages
by stefbv (Priest) on Jan 09, 2010 at 12:34 UTC

      Thanks - I overlooked that one. A brief test, with zwon's code, and it looks like it works. As long as I don't "use utf8;" or use -CS on the commandline. I'm not sure why... I suppose because the input I got wasn't actually utf8, but I don't know what else it could be when it's displaying fine here :-) (And konsole thinks that the encoding it's using is UTF-8, and LANG is set to en_US.utf8...) So I'm just confused as to why explicitly marking things as utf8 is breaking it.

      I'm sure that'll be a question for the future.

        For me it works with use utf8 but issues the warnings:

        substr outside of string at /usr/share/perl5/Text/WrapI18N.pm line 130 +. Use of uninitialized value $text in length at /usr/share/perl5/Text/Wr +apI18N.pm line 52.

        Update: after investigating a bit more I also found Text::LineFold, it looks like it does the right thing, though it also ignores utf8 flag on strings.

        use strict; use warnings; use 5.010; use utf8; use open ':std', ':utf8'; use Text::LineFold; my $lf = Text::LineFold->new( ColumnsMax => 12, ); while (<>) { my $folded = $lf->fold($_); utf8::decode($folded); say $folded; }