in reply to Re^2: Portable length() in bytes.
in thread Portable length() in bytes.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Portable length() in bytes.
by William G. Davis (Friar) on Nov 07, 2004 at 23:19 UTC | |
No, you wouldn't, but what if I have a function like this:
and someone calls it like this:
and the function syswrite()'s using length(), blindly assuming it will return the right value. The keyword here is portability. The idea is a length()-like function that returns the length in bytes regardless of your Perl distro, which enables you to write code that targets, say, Perl 5.005 but also works with Unicode-enabled Perls 5.6.1 and up. | [reply] [d/l] [select] |
by ysth (Canon) on Nov 07, 2004 at 23:34 UTC | |
The "length" passed to syswrite is useless; it expects and returns character length and offset. And whether the string being output is 1 byte or 2 bytes, it's just one character, and will be output as either 1 or 2 bytes depending on the output filehandle, not on how perl has it encoded. | [reply] [d/l] |
by William G. Davis (Friar) on Nov 08, 2004 at 00:37 UTC | |
Correct me if I'm wrong, but what you're saying is, if you syswrite() to a file handle binmode()'d as :utf8 and you write UTF-8 characters using length(), everything works fine, because syswrite() will interpret the length parameter to be the length in UTF characters, not bytes? First, what you're talking about only works with Perl 5.8+. Prior versions of Perl do not have the :utf8 binmode. Then you said this, which stumped me: The "length" passed to syswrite is useless; it expects and returns character length and offset. Well, here's what 5.8's perldoc -f syswrite says:
syswrite FILEHANDLE,SCALAR,LENGTH,OFFSET
syswrite FILEHANDLE,SCALAR,LENGTH
syswrite FILEHANDLE,SCALAR
Attempts to write LENGTH bytes of data from
variable SCALAR to the specified FILEHANDLE, using the
system call write(2). If LENGTH is not specified, writes
whole SCALAR. It bypasses buffered IO, so mixing this with
reads (other than sysread()), print, write, seek, tell, or
eof may cause confusion because the perlio and stdio layers
usually buffers data. Returns the number of bytes
actually written, or undef if there was an error (in this
case the errno variable $! is also set). If the LENGTH is
greater than the available data in the SCALAR after the
OFFSET, only as much data as is available will be written.
An OFFSET may be specified to write the data from some
part of the string other than the beginning. A negative
OFFSET specifies writing that many characters counting
backwards from the end of the string. In the case the SCALAR
is empty you can use OFFSET but only zero offset.
Note that if the filehandle has been marked as :utf8,
Unicode characters are written instead of bytes (the LENGTH,
OFFSET, and the return value of syswrite() are in UTF-8
encoded Unicode characters). The :encoding(...) layer
implicitly introduces the :utf8 layer. See "binmode",
"open", and the open pragma, open.
Which means under 5.8 you can get away with slipping syswrite() UTF-8 strings (and you can also drop the LENGTH parameter all together, as it's been optional since 5.6.1), but that still doesn't address the issue of portability. Can you guarantee me that this bit of code:
will work with any version of perl going back to 5.005? (Note the word "Portable" in the node title.) Here's an example that seems to break under 5.6.1, unless I'm missing something:
_˙¦_˙¦_˙¦ _˙¦ It seems like the third syswrite() is getting back 3 from length(), meaning three characters, which syswrite() interprets to be 3 bytes, so only the first smiley face gets written. I binmode()'d STDOUT and still got the same thing. | [reply] [d/l] [select] |
by ysth (Canon) on Nov 08, 2004 at 05:58 UTC | |
by William G. Davis (Friar) on Nov 08, 2004 at 08:40 UTC | |