in reply to Unicode word wrapping
Input and Output Disciplines
There is currently no easy way to mark data read from a file or other external source as being utf8. This will be one of the major areas of focus in the near future.
So part of the problem may be this: You expect your query parameter is encoded in UTF-8 (I'm assuming), but your script just sees a sequence of extended-ASCII characters. You might be able to get around this by explicitly using pack "U",... to reconstruct UTF-8 characters from the input one at a time, but I don't recall if I ever got that technique to work reliably.
If you're just trying to ensure that an input string doesn't exceed a particular character length, you should be able to use length($string) to get its length in characters rather than bytes. That assumes that you already have it stored internally as UTF-8, of course, and that you haven't done a use bytes.
Unicode support in 5.8 is supposed to be much improved, but I haven't yet had a chance to try it for myself yet.
$perlmonks{seattlejohn} = 'John Clyman';
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Unicode word wrapping
by lestrrat (Deacon) on Dec 09, 2002 at 08:56 UTC |