Re: Unicode word wrapping

What version of Perl are you using? 5.6.x exhibits some weird behavior with respect to Unicode. In particular, as perldoc perlunicode explains:

Input and Output Disciplines
There is currently no easy way to mark data read from a file or other external source as being utf8. This will be one of the major areas of focus in the near future.

So part of the problem may be this: You expect your query parameter is encoded in UTF-8 (I'm assuming), but your script just sees a sequence of extended-ASCII characters. You might be able to get around this by explicitly using pack "U",... to reconstruct UTF-8 characters from the input one at a time, but I don't recall if I ever got that technique to work reliably.

If you're just trying to ensure that an input string doesn't exceed a particular character length, you should be able to use length($string) to get its length in characters rather than bytes. That assumes that you already have it stored internally as UTF-8, of course, and that you haven't done a use bytes.

Unicode support in 5.8 is supposed to be much improved, but I haven't yet had a chance to try it for myself yet.

$perlmonks{seattlejohn} = 'John Clyman';

Comment on Re: Unicode word wrapping

Replies are listed 'Best First'.
Re: Re: Unicode word wrapping by lestrrat (Deacon) on Dec 09, 2002 at 08:56 UTC
Sorry, I guess I wasn't clear. First, I'm using Perl 5.8 As for the input from the CGI, it's originally in EUC-JP, and then I change it to when I receive it from the browser to UTF-8. This is because we eventually shove it in XML format. I want to wrap THAT utf8 string at a certain column	[reply]