in reply to Re^4: sprintf is printing unexepected output (Macs)
in thread sprintf is printing unexepected output

The fact that old Macs claimed to be ASCII systems and so don't provide any translation layers to real ASCII and yet defined "\n" as something other than ASCII newline (but "\n" is still the newline character for old Macs), has caused a lot of broken thinking in the Perl world.

I don't find the thinking to which you are referring to be "broken." The broken thinking, if there is any, is in conceptualizing the escape "\n" as popularized by C to be an ASCII character. In the above, even you called it an "ASCII newline". But there is no such thing. The C standard is very clear about "\n" being implementation dependent.

All that said, it would certainly be easier on everyone if LF were universally accepted as the newline character and we could finally put an end to the suffering that continues to be inflicted upon us by long dead hardware issues and designed incompatibilities.

-sauoq
"My two cents aren't worth a dime.";
  • Comment on Re^5: sprintf is printing unexepected output (Macs)

Replies are listed 'Best First'.
Re^6: sprintf is printing unexepected output (newline)
by tye (Sage) on Dec 29, 2006 at 21:15 UTC
    But there is no such thing [as] "ASCII newline"

    s/newline/line feed/, if that makes you feel better. I don't make a distinction between "newline" and "linefeed", since their usage is very often mixed and there is little in those names to make the distinction clear, but I may try to honor that distinction in future (since I see this distinction being made in many references I check). I make a distinction between "ASCII (newline|linefeed)" vs. "local (newline|linefeed)" vs. "filesystem (newline|linefeed)". I certainly find plenty of evidence that there is an "ASCII line feed", which is what I meant. Note that http://foldoc.org/?query=+newline even says "Though the term 'newline' appears in ASCII standards, it never caught on in the general computing world before Unix", so I'm not sure I believe your assertion (it is too bad that the ASCII standard is likely still not free to download).

    The broken thinking, if there is any, is in conceptualizing the escape "\n" as popularized by C to be an ASCII character

    No, "\n" isn't an ASCII character (and, to be clear, I never claimed that it was). It is a local character. On all ASCII systems it is ASCII line feed, except for old Macs where they chose to be "lazy" and try to avoid binmode by defining "\n" in C incorrectly. Though if you find me an ASCII or non-ASCII system besides old Macs where "\n" isn't the "line feed" character, then I'll have to adjust my thinking. I don't think you will, however.

    All that said, it would certainly be easier on everyone if LF were universally accepted as the newline character

    "newline character"?? There is no such thing. (: The newline sequence in file systems is widely varied and often isn't a single character. Even Unix knows that it has to translate "\n" to "\r\n" on output (it just doesn't translate when doing output to a file, waiting to do it only when doing output to a device; and translate "\r" to "\n" on input similarly). Many systems encode newlines outside of the data bytes of the file (in meta data). So "newline" can be one character, more than one character, or not a character at all. :)

    I don't find the thinking to which you are referring to be "broken."

    Then find a non-ASCII system with Perl on it and open a socket from it to some SMTP server on the internet and send "HELO\015\012" to it and tell me if it works. There are two possibilities, either "HELO" will show up in non-ASCII (and your system is broken) or "\015\012" will get translated from the local character set to character set that is expected to be received over internet TCP/IP connections and likely won't be "\015\012" any longer.

    The thinking is broken in thinking that you should hard-code the ASCII bit patterns for some characters but not for all. It happens to work on ASCII systems and on old Macs. And those are the only systems that they seem to know or care about or understand.

    You should never hard-code character bit patterns unless 1) you've first checked that you are on a system where such will work or 2) if you are hard-coding all ASCII bit patterns in order to write an ASCII stream no matter whether your system is ASCII or not. And perlport is "broken" for thinking otherwise.

    - tye        

      I don't make a distinction between "newline" and "linefeed"

      And that's the crux of problem. There is a distinction to be made. Not making it just hinders understanding.

      On all ASCII systems it is ASCII line feed, except for old Macs where they chose to be "lazy" and try to avoid binmode by defining "\n" in C incorrectly.

      There is nothing "incorrect" about defining "\n" as CR with regard to the C standard. Perhaps the C standard should dictate that, if your execution character set is ASCII, then "\n" must map to ASCII LF. If you want to argue that, fine; I don't think I'm well-versed enough in the issues to agree or disagree with such an argument.

      I think your comment about choosing 'to be "lazy" and try[ing] to avoid binmode' is way off the mark though. Apparently they didn't choose to be lazy enough and do it like the existing (virtuously) "lazy" ASCII systems out there already happily avoiding binmode. If they had they could have stood with everyone else, sniggering and pointing their fingers at CP/M, OS/2, and DOS.

      Thank heavens for POSIX. Now, if we could just get rid of Windows once and for all we could rid ourselves altogether of that stupid "b" in our fopen()s.

      "newline character"?? There is no such thing. (:

      There isn't in ASCII. There is, on the other hand, in C. It is implementation dependent but always fits in a char and it is represented by the escape sequence '\n'.

      Then find a non-ASCII system with Perl on it and open a socket from it to some SMTP server on the internet and send "HELO\015\012" to it and tell me if it works.

      While I understand the issue you are pointing out, I don't actually work on non-ASCII systems and don't know whether this is a real problem in practice. Certainly it must have been very well addressed by now? I also fail to see how this is at all specific to Perl. You'd have the same issue if you were, say, writing an MTA in C. Right? ASCII is the lingua franca of the network. If you happen to be working on some EBCDIC variation, you've got to learn to adapt. It's like the byte order issue but at a different level.

      I guess I can't disagree that perlport is broken for suggesting that using "\015\012" is a good idea without mentioning any caveats for non-ASCII systems. Have you submitted a doc patch? It seems a simple additional note would suffice.

      For whatever it's worth though, perlport hasn't muddled my thinking at all.

      -sauoq
      "My two cents aren't worth a dime.";