bwana147 has asked for the wisdom of the Perl Monks concerning the following question:

In a recent thread, I was thinking about (and was probably not the first to do so) the issue of \n being conveniently interpreted in several ways depending on the platform perl runs on (i.e. LF on Unices, CR on MacOS, CR+LF on DOS/Windows...). Convenient when dealing with text files but just plain annoying when you want to deal with some network protocol that demand a CR+LF as a line delimiter although they deal mostly with text (like, e.g., SMTP).

Then tye accurately pointed out that specifying \015\012 wouldn't prevent perl from inserting another \015 when running on Windows.

So what shall we do?

  1. use Socket qw( :crlf ): but this does nothing but provide constants for \015, \012 and \015\012.
  2. use the code from CGI.pm that tries its best to determine what architecture it's running on and sets variables accordingly. Should this particuliar bit of code be made into a module of its own?
  3. Write in binmode and use the aforementionned constants. Then what are the implications of this? Does binmode only affect the way \n is interpreted?

Thanks for your time and enlightened remarks.

--bwana147

Replies are listed 'Best First'.
Re: binmode, CR, LF
by tadman (Prior) on Jun 14, 2001 at 16:54 UTC
    As in:    use DownWithCRLF; Although I might have a few details wrong, the CR+LF combo dates back to the old teletypes used for telegrams which required separate characters to advance the page (line feed) and to reset the print-head (carriage return). Why DOS picked up on this specifically is odd, but then it was probably the guy at Seattle Computer Products and not Bill Gates who made this call.

    Anyway, I think binmode is the best call. #1 and #2 don't actually solve the problem, do they?
Re: binmode, CR, LF
by bikeNomad (Priest) on Jun 14, 2001 at 20:30 UTC
    Of course, if you're writing a network protocol, you're probably using sockets, which don't do character translation (recv and send). Other than peculiar cases like CGI where something is reading a file handle and writing to a socket, this shouldn't be a problem. And CGI.pm already puts STDIN, STDOUT, and STDERR into binmode, so why not just use $CRLF there?
(tye)Re: binmode, CR, LF
by tye (Sage) on Jun 14, 2001 at 20:58 UTC

    Network protocols aren't text files so you should always use binmode. binmode isn't going to break anything but it might fix some things besides \r\n changes. For example, without binmode under Win32, CTRL-Z would signify end of file.

    That is why Perl's socket code under Win32 has: #define OPEN_SOCKET(x) win32_open_osfhandle(x,O_RDWR|O_BINARY) so that Perl sockets under Win32 are always in binmode. This makes me wonder if there are any platforms where binmode is needed but Perl sockets aren't binmode by default.

    You make a good point about not using \r and \n when doing network programming. CGI.pm makes a good point about not using \015 when on a EBCDIC machine, which means that Socket.pm is in need of a patch!

    And is IO::Socket a replacement for Socket.pm or vice versa or are they just two similar choices?

    This sounds like something worth making a tiny module out of and getting CGI.pm, Socket.pm, and IO::Socket to use. (Note that the VMS-specific code in CGI.pm should stay in CGI.pm as that has to do with a quirk of the most popular VMS web server that doesn't follow the spec as opposed to how \r and \n get interpretted under VMS).

            - tye (but my friends call me "Tye")