in reply to Re^5: problem with 'bare LF' in script
in thread problem with 'bare LF' in script

A key quote from the documentation is "Perl uses \n to represent the "logical" newline" which implies that \n may not be \x0a. It then goes on to say that in MacPerl \n is \x0d.

By default Perl's text IO layers translate between the host OS's notion of a line end sequence and the character used internally to represent \n. If a string being manipulated hasn't been through an appropriate IO layer you will be dealing with the original CR and LF characters that were used. In various situations that means that an interpolated \n will not match LF (\x0a) and in many more situations it will not match an actual line end sequence (in Windows in particular).

As with many things in Perl, the default behavior is almost always what you want. So often so in fact that you tend to forget that there might be any other behavior.


Perl reduces RSI - it saves typing

Replies are listed 'Best First'.
Re^7: problem with 'bare LF' in script
by gone2015 (Deacon) on Nov 13, 2008 at 00:47 UTC

    Thank you.

    So, just to hammer this home: in MacPerl the string "\n" really is "\x0D" -- and "\r" is "\x0A". This is not something the IO layers have anything to do with. A literal string containing "\r" or "\n" in MacPerl has a different value to the same string on, say, a Linux Perl. Gosh.

    So, I suppose for MacPerl the IO layers must be capable of translating "\x0D\x0A" and "\x0A" to/from "\x0D" to allow for reading "foreign" files.

    I confess I think it would be less confusing if "\n" meant "\x0A" at all times, and translation to/from system line-endings was relegated to the IO layers. So rather than "\n" being a "virtual" line ending (and "\r" being its dual), I think it would be clearer if the IO layer "normalised" line endings to "\x0A" AKA "\n" -- which is essentially what the DOS/Winders Perl does.

    Ah well. There was something about this that didn't quite fit together, but without a Mac I could not pin down -- perhaps I couldn't see what the documentation was trying to tell me because I was refusing to believe in escape sequences with magical shape shifting properties :-(

    Perl never ceases to amaze me.

      So, I suppose for MacPerl the IO layers must be capable of translating "\x0D\x0A" and "\x0A" to/from "\x0D" to allow for reading "foreign" files

      Nope. Well, probably not what you expect anyway. MacPerl (to the best of my knowledge) doesn't by default translate foreign line ends - it just DWIM for native (pre OS X) line end characters - \x0d.

      You can of course use the :crlf file I/O translation layer to do the business, but the default behavior is to treat "native" files as you would expect because anything else is tricky or impossible. perlrun's PERLIO section, binmode and of course open are likely to be of interest too.


      Perl reduces RSI - it saves typing
Re^7: problem with 'bare LF' in script
by ikegami (Patriarch) on Nov 12, 2008 at 21:41 UTC
    The only system that's different from unix these days is Windows. The conversion is disabled by :raw which should be used on a socket in the first place. If any other odd systems need to be supported, it should be done by a PerlIO layer like it is in Windows. It's my opinion that the referenced section of perlport is outdated.

      perlop, under Quote-and-Quote-like-Operators, says:

      All systems use the virtual "\n" to represent a line terminator, called a "newline". There is no such thing as an unvarying, physical newline character. It is only an illusion that the operating system, device drivers, C libraries, and Perl all conspire to preserve. Not all systems read "\r" as ASCII CR and "\n" as ASCII LF. For example, on a Mac, these are reversed, and on systems without line terminator, printing "\n" may emit no actual data. In general, use "\n" when you mean a "newline" for your system, but use the literal ASCII when you need an exact character. For example, most networking protocols expect and prefer a CR+LF ("\015\012" or "\cM\cJ") for line terminators, and although they often accept just "\012", they seldom tolerate just "\015". If you get in the habit of using "\n" for networking, you may be burned some day.

      I wasn't sure what it meant by "read "\r"", but following GrandFather above I am now taking this to mean what "\r" is interpolated to, rather than what any IO read operation might be doing -- but I could be wrong... I've struggled to get my head around it so far :-(

        I wasn't sure what it meant by "read "\r""

        "read as" means "converted to when found in double-quoted string literals" here.

        On MacPerl (Perl for old Macs), ord("\n") was 13, and ord("\r") was 10.
        On Windows, unix and new Macs, ord("\n") is 10, and ord("\r") is 13.