in reply to Re: Detect line endings with CGI.pm upload
in thread Detect line endings with CGI.pm upload

Well, if you know the length of the # of pairs then I'm assuming you could figure out the easy answer, but I'm guessing you're looking at a dynamic field length.

For something dynamic... i would look at the dos2unix docs to see what the tool is removing to make it unix compat (guessing u've been there though).

Outside of all that, I would be inclined to try something like:
while (my $line = <INFILE>) { $line =~ s/\n$//; while ($line =~ s/(\w+)\/(\w+)//) { #the {\cM} should create a wor +d boundry since it's not alpha numeric print "key: $1\nvalue: $2\n"; } }

if you're looking at something where the line feeds are not showing up at all (so that the whole file is read in as one line....), I'm not 100% sure. I'll think about it, but have not run accross that particular scenario yet.

Replies are listed 'Best First'.
Re^3: Detect line endings with CGI.pm upload
by apu (Sexton) on Dec 27, 2008 at 04:51 UTC
    Its one long line of input, at least with this particular test file. And, your guess is correct... varied-length data so we can't just count characters or anything like that.
      So, how do you know what key1 is a representation for. To put it another way... How do you know what column value key1 is a u +nique value in? Or am I missing something obvious? I'm trying to get a picture of this dataset built off of a database. D +o you have a simple snapshot of the datasets you might be receiving?

        Not the real data but think of it as

        apple      red
        orange     orange
        grape      green
        apple      green
        

        Neither the keys nor values must be unique; the script will take care of merging the multiple values for a single key, if needed. The source database thinks it is outputting key{tab}value{newline} but, because of the different line-endings, I get key{tab}value{\cM} instead. At least, I do when the end-user creates the file on a Mac. But, other end users can create this source file on a Windows system where I get different line endings so I need to accommodate any line ending. This is also one of three source files which could all come from different sources before the end-user uploads them using this CGI script.

        I was hoping there was a Perl/CGI.pm equivalent of FTP's "ASCII" mode.

      Sounds like a tough spot ur in. The only thing I can suggest is to either get the user uploading the data to do it in a specified format or do your best to try to find key-to-value pair patt +erns. If I was in the spot ur in and couldn't find the carriage return value possibilities, then I would probably try to pursue the infile formatting standards as best as possible. Things like # of columns in the table or requiring the user to make the first line a 'header record' so you could see where the key value pairs repeat. Best of luck