William G. Davis has asked for the wisdom of the Perl Monks concerning the following question:
Update 12/7/03:
Apparently, the POE people had to deal with this very same problem. See how they did it.
Hi Monks.
At the moment, I'm working on some network libraries and attempting to add to them support for Unicode. My problem is that Perl's system IO functions--the ones I'm using, sysread() and syswrite()--all take the length to read/write in bytes, and yet I can't seem to find any portable way to get that information.
As of 5.6.1, strings are stored internally as UTF-8 and all built-in functions that purport to operate on characters do operate on characters; namely length(), which now returns the length in characters as opposed to the length in bytes.
To force length() to return the length in bytes, perlunicode says you can use the bytes pragma, as this example illustrates:
#!/usr/bin/perl -w use 5.6.1; use strict; # three smiley faces: my $string = "\x{263a}\x{263a}\x{263a}"; printf("%s: %d characters\n", $string, length $string); { use bytes; printf("%s: %d bytes\n", $string, length $string); }
That's all well and good, but unfortunately, the bytes pragma was introduced as of 5.6.1, and my libraries are supposed to support perl back to 5.005. I can't wrap "use bytes;" in an eval block, so what can I do?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Perl + Unicode == Networking Woes
by liz (Monsignor) on Nov 24, 2003 at 20:48 UTC | |
by Roger (Parson) on Nov 24, 2003 at 22:31 UTC | |
by William G. Davis (Friar) on Nov 24, 2003 at 21:14 UTC | |
|
Re: Perl + Unicode == Networking Woes
by Roy Johnson (Monsignor) on Nov 24, 2003 at 20:49 UTC | |
by William G. Davis (Friar) on Nov 24, 2003 at 21:11 UTC | |
|
Somewhat off topic, but...
by William G. Davis (Friar) on Nov 24, 2003 at 21:21 UTC | |
|
Re: Perl + Unicode == Networking Woes
by Roger (Parson) on Nov 24, 2003 at 20:36 UTC | |
by William G. Davis (Friar) on Nov 24, 2003 at 20:55 UTC |