in reply to Detect line endings with CGI.pm upload

The input files are all generated by various third-party databases and saved as either tab-delimited or comma-separated text files. But, maybe I've been staring at this too long, because I must still be missing something stupid... Example input file, as seen with "less test.txt" in either a Mac OS X terminal or on the Linux console:
key1{tab}value1{\cM}key2{tab}value2{\cM}key3{tab}value3{\cM}key4{tab}value4
Test Perl script:
#!/usr/bin/perl -T

use CGI;
my $cgi = new CGI;

my $file = $cgi->upload('file');

print "Content-type: text/plain\n\n";

$file =~ s/(\x0d?\x0a|\x0d)/\n/smg;
while (my $line = <$file>)
{ 
  chop $line;
  ($key, $value) = split(/\t/,$line);
  print "key: $key\nvalue: $value\n";
}
Output:
key: key1
value: value1{\cM}key2
where {tab} is a tab character and {\cM} is a Control-M character.

Replies are listed 'Best First'.
Re^2: Detect line endings with CGI.pm upload
by Anonymous Monk on Dec 27, 2008 at 01:12 UTC

    Well, if you know the length of the # of pairs then I'm assuming you could figure out the easy answer, but I'm guessing you're looking at a dynamic field length.

    For something dynamic... i would look at the dos2unix docs to see what the tool is removing to make it unix compat (guessing u've been there though).

    Outside of all that, I would be inclined to try something like:
    while (my $line = <INFILE>) { $line =~ s/\n$//; while ($line =~ s/(\w+)\/(\w+)//) { #the {\cM} should create a wor +d boundry since it's not alpha numeric print "key: $1\nvalue: $2\n"; } }

    if you're looking at something where the line feeds are not showing up at all (so that the whole file is read in as one line....), I'm not 100% sure. I'll think about it, but have not run accross that particular scenario yet.

      Its one long line of input, at least with this particular test file. And, your guess is correct... varied-length data so we can't just count characters or anything like that.
        So, how do you know what key1 is a representation for. To put it another way... How do you know what column value key1 is a u +nique value in? Or am I missing something obvious? I'm trying to get a picture of this dataset built off of a database. D +o you have a simple snapshot of the datasets you might be receiving?
        Sounds like a tough spot ur in. The only thing I can suggest is to either get the user uploading the data to do it in a specified format or do your best to try to find key-to-value pair patt +erns. If I was in the spot ur in and couldn't find the carriage return value possibilities, then I would probably try to pursue the infile formatting standards as best as possible. Things like # of columns in the table or requiring the user to make the first line a 'header record' so you could see where the key value pairs repeat. Best of luck