in reply to Detect line endings with CGI.pm upload

If you are absolutely sure that they are text files, ... something like the following could be used to normalize everything to match your current system's line endings.

Assume the content of the file is in $file:

$file =~ s/(\x0d?\x0a|\x0d)/\n/smg;

Now when you write the string to disk, the line endings will be consistent.

It's important to do the alternation in the correct order, otherwise you can get a surprising result. If you don't have to worry about old Macs, you can get away with:

$file =~ s/\x0d?\x0a/\n/smg;

Make certain you don;t do this to non-text files, the results are not recoverable.

G. Wade

Replies are listed 'Best First'.
Re^2: Detect line endings with CGI.pm upload
by ikegami (Patriarch) on Dec 26, 2008 at 22:07 UTC
    If you don't have to worry about old Macs, you don't have to worry about running on old Macs, so you can get away with
    s/\r?\n/\n/g
    or just
    s/\r(?=\n)//g

      I remember a thread here recently pointing out that \n is not guaranteed to be a line feed, and that the suggested approach approach was to use \x0a and \x0d because you can be sure of their meanings.

      I vaguely remember having that problem way back when I was a C programmer.

      G. Wade
        The only system where \n and \r aren't LF and CR is MacPerl (Perl for old Macs). This has nothing to do with C.
Re^2: Detect line endings with CGI.pm upload
by apu (Sexton) on Dec 28, 2008 at 10:58 UTC

    Since we're using CGI.pm's upload function, $file is the file name on the client computer (in a scalar context) or a file-handle, but not the contents of the file itself. As such, $file =~ s/(\x0d?\x0a|\x0d)/\n/smg; is performing the substitution on the file name, not the contents of the file. But, its a good idea.

    Manually setting $/ works but how do I detect what the line-ending should be so I can set $/ programmatically? perlvar is very explicit about it being a string, not a regex, so that's not an option. And the line-ending can be different for each of the three files being uploaded, even within a single CGI upload, so I cannot even do something based on the browser User-Agent.

      As I said, if we assumed the content of the file was in $file we could use that expression. If you want the content elsewhere, you can change the expression.

      If you are looking at determining the line ending while reading the data from the socket, you might want to look at the user agent identification in the HTTP_USER_AGENT environment variable. You should be able to determine the OS from there.

      G. Wade