Re: Detect line endings with CGI.pm upload
by gwadej (Chaplain) on Dec 26, 2008 at 21:55 UTC
|
If you are absolutely sure that they are text files, ... something like the following could be used to normalize everything to match your current system's line endings.
Assume the content of the file is in $file:
$file =~ s/(\x0d?\x0a|\x0d)/\n/smg;
Now when you write the string to disk, the line endings will be consistent.
It's important to do the alternation in the correct order, otherwise you can get a surprising result. If you don't have to worry about old Macs, you can get away with:
$file =~ s/\x0d?\x0a/\n/smg;
Make certain you don;t do this to non-text files, the results are not recoverable.
| [reply] [d/l] [select] |
|
|
If you don't have to worry about old Macs, you don't have to worry about running on old Macs, so you can get away with
s/\r?\n/\n/g
or just
s/\r(?=\n)//g
| [reply] [d/l] [select] |
|
|
I remember a thread here recently pointing out that \n is not guaranteed to be a line feed, and that the suggested approach approach was to use \x0a and \x0d because you can be sure of their meanings.
I vaguely remember having that problem way back when I was a C programmer.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
Since we're using CGI.pm's upload function, $file is the file name on the client computer (in a scalar context) or a file-handle, but not the contents of the file itself. As such, $file =~ s/(\x0d?\x0a|\x0d)/\n/smg; is performing the substitution on the file name, not the contents of the file. But, its a good idea.
Manually setting $/ works but how do I detect what the line-ending should be so I can set $/ programmatically? perlvar is very explicit about it being a string, not a regex, so that's not an option. And the line-ending can be different for each of the three files being uploaded, even within a single CGI upload, so I cannot even do something based on the browser User-Agent.
| [reply] [d/l] |
|
|
As I said, if we assumed the content of the file was in $file we could use that expression. If you want the content elsewhere, you can change the expression.
If you are looking at determining the line ending while reading the data from the socket, you might want to look at the user agent identification in the HTTP_USER_AGENT environment variable. You should be able to determine the OS from there.
| [reply] [d/l] [select] |
Re: Detect line endings with CGI.pm upload
by ikegami (Patriarch) on Dec 26, 2008 at 22:23 UTC
|
foreach my $line (<$filehandle>)
needlessly loads the entire file into memory unlike the otherwise equivalent
while (my $line = <$filehandle>)
| [reply] [d/l] [select] |
Re: Detect line endings with CGI.pm upload
by apu (Sexton) on Dec 26, 2008 at 23:39 UTC
|
The input files are all generated by various third-party databases and saved as either tab-delimited or comma-separated text files. But, maybe I've been staring at this too long, because I must still be missing something stupid...
Example input file, as seen with "less test.txt" in either a Mac OS X terminal or on the Linux console:
key1{tab}value1{\cM}key2{tab}value2{\cM}key3{tab}value3{\cM}key4{tab}value4
Test Perl script:
#!/usr/bin/perl -T
use CGI;
my $cgi = new CGI;
my $file = $cgi->upload('file');
print "Content-type: text/plain\n\n";
$file =~ s/(\x0d?\x0a|\x0d)/\n/smg;
while (my $line = <$file>)
{
chop $line;
($key, $value) = split(/\t/,$line);
print "key: $key\nvalue: $value\n";
}
Output:
key: key1
value: value1{\cM}key2
where {tab} is a tab character and {\cM} is a Control-M character. | [reply] [d/l] |
|
|
Well, if you know the length of the # of pairs then I'm assuming you could figure out the easy answer, but I'm guessing you're looking at a dynamic field length.
For something dynamic... i would look at the dos2unix docs to see what the tool is removing to make it unix compat (guessing u've been there though).
Outside of all that, I would be inclined to try something like:
while (my $line = <INFILE>) {
$line =~ s/\n$//;
while ($line =~ s/(\w+)\/(\w+)//) { #the {\cM} should create a wor
+d boundry since it's not alpha numeric
print "key: $1\nvalue: $2\n";
}
}
if you're looking at something where the line feeds are not showing up at all (so that the whole file is read in as one line....), I'm not 100% sure. I'll think about it, but have not run accross that particular scenario yet.
| [reply] [d/l] |
|
|
Its one long line of input, at least with this particular test file. And, your guess is correct... varied-length data so we can't just count characters or anything like that.
| [reply] |
|
|
|
|
|
|