in reply to CSV file

Other people will probably advice you some module for parsing CSV files.
I solved this problem a year ago, and I found this trick:
sub parseCSVrow { my ($row, $fdelim, $tdelim) = @_; chomp($row); my @ret = split(/$fdelim/,$row); my $i = 0; while($i < @ret) { if (($ret[$i] =~ s/$tdelim/$tdelim/ge) % 2) { if ($i + 1 == @ret) { die "ERROR: not ending text"; } else { $ret[$i] = $ret[$i].$fdelim.$ret[$i+1]; splice(@ret,$i+1,1); } } else { $i++; } } return \@ret; }
The idea is to join previously split parts, if some part contains odd count of text delimiters. But, there is one big mistake. CSV field can contain EOL and in this case my code piece does not work.
Update: Respectively, works, but you cannot naively call this subroutine, but slighlty change the code.

Replies are listed 'Best First'.
Re^2: CSV file
by CountZero (Bishop) on Oct 31, 2005 at 13:39 UTC
    That's a nice trick, but will it work if you have escaped delimiters in your input?

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      I disagree:
      When field delimiter appears as part of field value (beside text delimiters), the row will be split by this, but the parts will be joined after that.
      And I do not remember if the field value can contain text delimiter - I think that yes, but doubled - and count of text delimiters modulo 2 is invariant in this case.
Re^2: CSV file
by azaria (Beadle) on Oct 31, 2005 at 13:55 UTC
    Thanks for your reply !!! What is the meaning of $fdelim, $tdelim ? Azaria
      $tdelim is text delimiter, typically quotes or apostrophes, and $fdelim is field delimiter, typically comma (Comma Sepparated Value).