in reply to Writing a CSV Parser/Printer

Quite complicated, your code ;-)

Question: Why is Text::CSV no option?

Question: Ever considdered to use RegEx for acomplishing your task instead of splitting to chars and building your string?

Question: Could you please provide us with linenumbers? I can't reproduce your error-messages here

Answer: One problem I see is your "my $datastring". You want to collect your data in that scalar but clear it each time through the loop. This way you won't ever get something usefull out of your loop.

Replies are listed 'Best First'.
Re: Re: Writing a CSV Parser/Printer
by Anonymous Monk on Jun 26, 2003 at 06:46 UTC
    Quite complicated, your code ;-)

    Yep, that's the source of the problem ;)

    As for regexes - I'm not very good with them so I fell back on the c-style approach. I'm not sure how to add line numbers - can this be done though Perlmonks?

    As for Text::CSV - I can't install modules on the server (I can upload pure-perl ones though). I definately don't have a problem with using them for these type of tedious, error-prone endeavours. Any suggestions of alternatives are welcome.

    You're quite right about $dataString - I moved it out of the loop and it gets rid of the errors. The out.csv file is still just a bunch of quotes and commas.

    Here's the slightly modified code:

    Thanks for the help :)

      You have to provide us with line numbers. This can't be done on perlmonks. You can get them with:

      perl -pe '$_="$.: $_"' your_input > your_output
      I'm not sure how your desired output should look like. Maybe this will help you. It uses RegEx:

      use strict; use warnings; while (<DATA>) { my (@fields)= split /, /; foreach (@fields) { if (s/^"((?:[^"\\]|\\.)*)"$/$1/) { #correct tr/\\//d; # No more \ print "$_\n"; } } } __END__ "Perlmonks", "http://www.perlmonks.org", "excellent ;)" "csv", "csv\"xxx", "trall\ala"
      Short explanation for the RegEx:

      /^"((?:[^"\\]|\\.)*)"$/$1/

      ^"
      matches your field's quotechar at the start of the field
      (...)
      will "remember" what was matched inside the quotes
      (?:...)*
      This will match anything in place of the ... and tells the parser that it may apear as often as possible. Even zero times
      [^"\\]
      will match any character but " and \
      |
      is an alternative. Either the left or the right part has to match
      \\.
      Will match any "escaped" character
      "$
      again your quotechar but now at the end

        Thanks for the excellent explanation :)

        With regards to the error line numbers: there aren't any errors anymore - it just doesn't produce the desired results. I have a feeling it's quite a ways away as well - your approach is far clearer.

        One question about the split if it's fed data like:

        "csv", "csv\"x, xx", "trall\ala"

        It will choke on the second entry. How would I go about avoiding this? I could use something like split/","/; which would make problems far less likely, but is there a better way? Some sort of notation for when it's inside the field?