Re^3: Native newline encoding

That version supports opening files in TEXT mode (similar to FTP) and there are two ways to do it, first one is to convert the native new-lines to CRLF before sending through the network and the second one is to tell the client what the native newline sequence is and let it handle the burden of the conversion.

Hm. My reading of the appropriate RFC is slightly different, in that the server can choose whether to send CRLF or a single char line ending of their choice:


4.3 Determining Server Newline Convention

   In order to correctly process text files in a cross platform
   compatible way, the newline convention must be converted from that 
+of
   the server to that of the client, or, during an upload, from that o
+f
   the client to that of the server.

   Versions 3 and prior of this protocol made no provisions for
   processing text files.  Many clients implemented some sort of
   conversion algorithm, but without either a 'canonical' on the wire
   format or knowledge of the servers newline convention, correct
   conversion was not always possible.

   Starting with Version 4, the SSH_FXF_TEXT file open flag (Section
   6.3) makes it possible to request that the server translate a file 
+to
   a 'canonical' on the wire format.  This format uses \r\n as the lin
+e
   separator.

   Servers for systems using multiple newline characters (for example,
   Mac OS X or VMS) or systems using counted records, MUST translate t
+o
   the canonical form.

   However, to ease the burden of implementation on servers that use a
   single, simple separator sequence, the following extension allows t
+he
   canonical format to be changed.

        string "newline"
        string new-canonical-separator (usually "\r" or "\n" or "\r\n"
+)

   All clients MUST support this extension.

   When processing text files, clients SHOULD NOT translate any
   character or sequence that is not an exact match of the servers
   newline separator.

   In particular, if the newline sequence being used is the canonical
   "\r\n" sequence, a lone \r or a lone \n SHOULD be written through
   without change.
[download]

And it is down to the clients to convert whatever the server sends to their required local form.

At this point, it seems to me that the simple solution is the first one letting Perl read the file in text mode and then applying s/\n/\r\n/. This may be slightly incorrect in some edge cases (for instance, files on Windows with \n line endings) that nobody would care about so I don't either!

I whole-heartedly agree, though I would approach that solution in a slightly different manner.

When TEXT mode is requested:

Open the file in text mode;
Read the file line-by-line using the system default INPUT_SEPARATOR;
chomp each line read;
Write to the socket line-by-line; having set the OUTPUT_SEPARATOR to CRLF;

This way, whatever the local line separator is, it gets taken care of by Perl (or the CRT of you're using XS). And the data is transmitted with the required 'canonical newlines'.

Clients then do the same in reverse. Read from the socket line-by-line having set their INPUT_SEPARATOR to CRLF; chomp; and write line-by-line using the default OUTPUT_SEPARATOR for their local platform.

This way, the conversions are taken care of at both ends by perl or the CRT. At least, for ascii/ANSi/ISO-whatever-that-number-is files that have the 'correct' newlines on the originating platforms.

Things (will) get far more messy once the RFCs start dealing with Unicrap.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Comment on Re^3: Native newline encoding Download Code

Replies are listed 'Best First'.
Re^4: Native newline encoding by sauoq (Abbot) on May 23, 2012 at 11:55 UTC
Things (will) get far more messy once the RFCs start dealing with Unicrap. The RFCs have handled binary data for years, and no one batted an eye. `-sauoq "My two cents aren't worth a dime.";`	[reply]
Re^5: Native newline encoding by BrowserUk (Patriarch) on May 23, 2012 at 15:04 UTC
So, you consider this "binary": C:\test>od -t x1 Huawei.xml \| head 0000000 0d 0a 0d 0a 0d 0a 0d 0a 3c 21 44 4f 43 54 59 50 0000020 45 20 48 54 4d 4c 20 50 55 42 4c 49 43 20 22 2d 0000040 2f 2f 57 33 43 2f 2f 44 54 44 20 48 54 4d 4c 20 0000060 34 2e 30 31 20 54 72 61 6e 73 69 74 69 6f 6e 61 0000100 6c 2f 2f 45 4e 22 20 22 68 74 74 70 3a 2f 2f 77 0000120 77 77 2e 77 33 2e 6f 72 67 2f 54 52 2f 68 74 6d 0000140 6c 34 2f 6c 6f 6f 73 65 2e 64 74 64 22 3e 0d 0a 0000160 3c 68 74 6d 6c 3e 0d 0a 3c 68 65 61 64 3e 0d 0a 0000200 3c 74 69 74 6c 65 3e e8 8f af e7 82 ba ef bc 8c 0000220 e8 8f af e7 82 ba e5 85 ac e5 8f b8 ef bc 8c e8 ... [download] Source With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re^6: Native newline encoding by sauoq (Abbot) on May 23, 2012 at 21:38 UTC
Without getting all epistemological, are you claiming it isn't? `-sauoq "My two cents aren't worth a dime.";`	[reply]
Re^7: Native newline encoding by BrowserUk (Patriarch) on May 28, 2012 at 11:44 UTC
Re^8: Native newline encoding by ikegami (Patriarch) on May 29, 2012 at 23:58 UTC
Some notes below your chosen depth have not been shown here
Re^8: Native newline encoding by sauoq (Abbot) on May 28, 2012 at 17:40 UTC
Some notes below your chosen depth have not been shown here