Re: Detect line endings with CGI.pm upload

The input files are all generated by various third-party databases and saved as either tab-delimited or comma-separated text files. But, maybe I've been staring at this too long, because I must still be missing something stupid... Example input file, as seen with "less test.txt" in either a Mac OS X terminal or on the Linux console:

key1{tab}value1{\cM}key2{tab}value2{\cM}key3{tab}value3{\cM}key4{tab}value4

Test Perl script:

#!/usr/bin/perl -T

use CGI;
my $cgi = new CGI;

my $file = $cgi->upload('file');

print "Content-type: text/plain\n\n";

$file =~ s/(\x0d?\x0a|\x0d)/\n/smg;
while (my $line = <$file>)
{ 
  chop $line;
  ($key, $value) = split(/\t/,$line);
  print "key: $key\nvalue: $value\n";
}

Output:

key: key1
value: value1{\cM}key2

where {tab} is a tab character and {\cM} is a Control-M character.

Comment on Re: Detect line endings with CGI.pm upload Download Code

Replies are listed 'Best First'.
Re^2: Detect line endings with CGI.pm upload by Anonymous Monk on Dec 27, 2008 at 01:12 UTC
Well, if you know the length of the # of pairs then I'm assuming you could figure out the easy answer, but I'm guessing you're looking at a dynamic field length. For something dynamic... i would look at the dos2unix docs to see what the tool is removing to make it unix compat (guessing u've been there though). Outside of all that, I would be inclined to try something like: `while (my $line = <INFILE>) { $line =~ s/\n$//; while ($line =~ s/(\w+)\/(\w+)//) { #the {\cM} should create a wor +d boundry since it's not alpha numeric print "key: $1\nvalue: $2\n"; } }` [download] if you're looking at something where the line feeds are not showing up at all (so that the whole file is read in as one line....), I'm not 100% sure. I'll think about it, but have not run accross that particular scenario yet.	[reply] [d/l]
Re^3: Detect line endings with CGI.pm upload by apu (Sexton) on Dec 27, 2008 at 04:51 UTC
Its one long line of input, at least with this particular test file. And, your guess is correct... varied-length data so we can't just count characters or anything like that.	[reply]
Re^4: Detect line endings with CGI.pm upload by Anonymous Monk on Dec 27, 2008 at 08:08 UTC
`So, how do you know what key1 is a representation for. To put it another way... How do you know what column value key1 is a u +nique value in? Or am I missing something obvious? I'm trying to get a picture of this dataset built off of a database. D +o you have a simple snapshot of the datasets you might be receiving?` [download]	[reply] [d/l]
Re^5: Detect line endings with CGI.pm upload by apu (Sexton) on Dec 27, 2008 at 08:39 UTC
Re^4: Detect line endings with CGI.pm upload by Anonymous Monk on Dec 28, 2008 at 00:11 UTC
Sounds like a tough spot ur in. The only thing I can suggest is to either get the user uploading the data to do it in a specified format or do your best to try to find key-to-value pair patt +erns. If I was in the spot ur in and couldn't find the carriage return value possibilities, then I would probably try to pursue the infile formatting standards as best as possible. Things like # of columns in the table or requiring the user to make the first line a 'header record' so you could see where the key value pairs repeat. Best of luck [download]	[reply] [d/l]