Detect line endings with CGI.pm upload

apu has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Detect line endings with CGI.pm upload by gwadej (Chaplain) on Dec 26, 2008 at 21:55 UTC
If you are absolutely sure that they are text files, ... something like the following could be used to normalize everything to match your current system's line endings. Assume the content of the file is in `$file`: `$file =~ s/(\x0d?\x0a\|\x0d)/\n/smg;` Now when you write the string to disk, the line endings will be consistent. It's important to do the alternation in the correct order, otherwise you can get a surprising result. If you don't have to worry about old Macs, you can get away with: `$file =~ s/\x0d?\x0a/\n/smg;` Make certain you don;t do this to non-text files, the results are not recoverable. G. Wade	[reply] [d/l] [select]
Re^2: Detect line endings with CGI.pm upload by ikegami (Patriarch) on Dec 26, 2008 at 22:07 UTC
If you don't have to worry about old Macs, you don't have to worry about running on old Macs, so you can get away with `s/\r?\n/\n/g` [download] or just `s/\r(?=\n)//g` [download]	[reply] [d/l] [select]
Re^3: Detect line endings with CGI.pm upload by gwadej (Chaplain) on Dec 26, 2008 at 23:06 UTC
I remember a thread here recently pointing out that `\n` is not guaranteed to be a line feed, and that the suggested approach approach was to use `\x0a` and `\x0d` because you can be sure of their meanings. I vaguely remember having that problem way back when I was a C programmer. G. Wade	[reply] [d/l] [select]
Re^4: Detect line endings with CGI.pm upload by ikegami (Patriarch) on Dec 26, 2008 at 23:43 UTC
Re^5: Detect line endings with CGI.pm upload by gwadej (Chaplain) on Dec 28, 2008 at 20:55 UTC
Some notes below your chosen depth have not been shown here
Re^2: Detect line endings with CGI.pm upload by apu (Sexton) on Dec 28, 2008 at 10:58 UTC
Since we're using CGI.pm's upload function, $file is the file name on the client computer (in a scalar context) or a file-handle, but not the contents of the file itself. As such, `$file =~ s/(\x0d?\x0a\|\x0d)/\n/smg;` is performing the substitution on the file name, not the contents of the file. But, its a good idea. Manually setting $/ works but how do I detect what the line-ending should be so I can set $/ programmatically? perlvar is very explicit about it being a string, not a regex, so that's not an option. And the line-ending can be different for each of the three files being uploaded, even within a single CGI upload, so I cannot even do something based on the browser User-Agent.	[reply] [d/l]
Re^3: Detect line endings with CGI.pm upload by gwadej (Chaplain) on Dec 28, 2008 at 21:01 UTC
As I said, if we assumed the content of the file was in `$file` we could use that expression. If you want the content elsewhere, you can change the expression. If you are looking at determining the line ending while reading the data from the socket, you might want to look at the user agent identification in the `HTTP_USER_AGENT` environment variable. You should be able to determine the OS from there. G. Wade	[reply] [d/l] [select]
Re: Detect line endings with CGI.pm upload by ikegami (Patriarch) on Dec 26, 2008 at 22:23 UTC
`foreach my $line (<$filehandle>)` [download] needlessly loads the entire file into memory unlike the otherwise equivalent `while (my $line = <$filehandle>)` [download]	[reply] [d/l] [select]
Re: Detect line endings with CGI.pm upload by apu (Sexton) on Dec 26, 2008 at 23:39 UTC
The input files are all generated by various third-party databases and saved as either tab-delimited or comma-separated text files. But, maybe I've been staring at this too long, because I must still be missing something stupid... Example input file, as seen with "`less test.txt`" in either a Mac OS X terminal or on the Linux console: key1{tab}value1{\cM}key2{tab}value2{\cM}key3{tab}value3{\cM}key4{tab}value4 Test Perl script: #!/usr/bin/perl -T use CGI; my $cgi = new CGI; my $file = $cgi->upload('file'); print "Content-type: text/plain\n\n"; $file =~ s/(\x0d?\x0a\|\x0d)/\n/smg; while (my $line = <$file>) { chop $line; ($key, $value) = split(/\t/,$line); print "key: $key\nvalue: $value\n"; } Output: key: key1 value: value1{\cM}key2 where {tab} is a tab character and {\cM} is a Control-M character.	[reply] [d/l]
Re^2: Detect line endings with CGI.pm upload by Anonymous Monk on Dec 27, 2008 at 01:12 UTC
Well, if you know the length of the # of pairs then I'm assuming you could figure out the easy answer, but I'm guessing you're looking at a dynamic field length. For something dynamic... i would look at the dos2unix docs to see what the tool is removing to make it unix compat (guessing u've been there though). Outside of all that, I would be inclined to try something like: `while (my $line = <INFILE>) { $line =~ s/\n$//; while ($line =~ s/(\w+)\/(\w+)//) { #the {\cM} should create a wor +d boundry since it's not alpha numeric print "key: $1\nvalue: $2\n"; } }` [download] if you're looking at something where the line feeds are not showing up at all (so that the whole file is read in as one line....), I'm not 100% sure. I'll think about it, but have not run accross that particular scenario yet.	[reply] [d/l]
Re^3: Detect line endings with CGI.pm upload by apu (Sexton) on Dec 27, 2008 at 04:51 UTC
Its one long line of input, at least with this particular test file. And, your guess is correct... varied-length data so we can't just count characters or anything like that.	[reply]
Re^4: Detect line endings with CGI.pm upload by Anonymous Monk on Dec 27, 2008 at 08:08 UTC
Re^5: Detect line endings with CGI.pm upload by apu (Sexton) on Dec 27, 2008 at 08:39 UTC
Re^4: Detect line endings with CGI.pm upload by Anonymous Monk on Dec 28, 2008 at 00:11 UTC