Update 12/7/03:

Apparently, the POE people had to deal with this very same problem. See how they did it.


Hi Monks.

At the moment, I'm working on some network libraries and attempting to add to them support for Unicode. My problem is that Perl's system IO functions--the ones I'm using, sysread() and syswrite()--all take the length to read/write in bytes, and yet I can't seem to find any portable way to get that information.

As of 5.6.1, strings are stored internally as UTF-8 and all built-in functions that purport to operate on characters do operate on characters; namely length(), which now returns the length in characters as opposed to the length in bytes.

To force length() to return the length in bytes, perlunicode says you can use the bytes pragma, as this example illustrates:

#!/usr/bin/perl -w use 5.6.1; use strict; # three smiley faces: my $string = "\x{263a}\x{263a}\x{263a}"; printf("%s: %d characters\n", $string, length $string); { use bytes; printf("%s: %d bytes\n", $string, length $string); }

That's all well and good, but unfortunately, the bytes pragma was introduced as of 5.6.1, and my libraries are supposed to support perl back to 5.005. I can't wrap "use bytes;" in an eval block, so what can I do?


In reply to Perl + Unicode == Networking Woes by William G. Davis

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.