IO::Socket::INET newline conversions and buffering

7stud has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

It's my understanding that the 'readers': <> and read() are filtered through the C stdio library, and that stdio automatically does:

newline conversions
buffering

It's also my understanding that IO::Socket::INET sockets are created with autoflush turned on, which eliminates buffering on those sockets. But what about newline conversions? As far as I can tell, newline conversions aren't affected by autoflush(). So it seems to me that if you want to prevent newline conversions while reading from a socket, you have to either:

use sysread()
call binmode() on the socket and then use <> or read()

However, on p. 119 of "Network Programming with Perl", after the author points out that when reading the message body of an http response, you must be prepared to read binary data, e.g. an mp3 file (where you don't want to do newline conversions), the author uses the following code to read the body of an http response:

print $data while read($socket, $data, 1024) > 0;
[download]

where $socket is an IO::Socket::INET socket. Isn't the read() there going to do newline conversions?

Comment on IO::Socket::INET newline conversions and buffering Download Code

Replies are listed 'Best First'.
Re: IO::Socket::INET newline conversions and buffering by almut (Canon) on Feb 21, 2010 at 17:55 UTC
It's my understanding that the 'readers': <> and read() are filtered through the C stdio library I don't think that's correct — at least not for most modern perls that are built to use PerlIO. As can be shown using `ltrace`, the PerlIO read/write operations eventually directly map to the system calls `read(2)` and `write(2)` (i.e. not going through the respective stdio/libc calls). In other words, AFAICT, any buffering and newline conversion behavior that applies to stdio is irrelevant here.	[reply] [d/l] [select]
Re: IO::Socket::INET newline conversions and buffering by ikegami (Patriarch) on Feb 21, 2010 at 18:17 UTC
Since Perl 5.8, Perl uses an IO stack called PerlIO. Layers can be added to the file handle to perform tasks such as LF<=>CRLF conversion and character encoding. It's possible to override the defaults when you create a file handle. It's also possible to modify the layers later on using `binmode`. It appears that `connect` does not add the default layers, so it effectively calls `binmode`. There's no harm in calling `binmode` yourself to be safe. `binmode` with no third argument removes/disables any `:crlf` and `:encoding` layers.	[reply] [d/l] [select]
Re: IO::Socket::INET newline conversions and buffering by 7stud (Deacon) on Feb 23, 2010 at 15:07 UTC
Thanks for the responses almut, thanks for investigating with ltrace. I don't have ltrace on mac osx 10.4.11, so I was not able to try that out for myself. Ok, so now perl doesn't call the C stdio library to establish buffering and newline conversions. Instead, as ikegami said, perl uses its own PerlIO 'layers'. Edit: Actually, you can still direct perl to use the C stdio library if you want: :stdio Layer which calls fread , fwrite and fseek /ftell etc. Note that as this is "real" stdio it will ignore any layers beneath it and go straight to the operating system via the C library as usual. I'm not sure how that is useful, though. ikegami, I'm not quite understanding what connect() has to do with which PerlIO layers are added to an IO::Socket::INET socket. Wouldn't new() be the method in which the layers were established for the socket? I don't have windows, so I can't test out newline conversions. Or, were you just checking every method to see if any of them added a newline conversion layer to the socket?	[reply]