in reply to Re: Comparison of the parsing features of CSV (and xSV) modules
in thread Comparison of the parsing features of CSV (and xSV) modules

What would be some example data, how it's currently being parsed, and how you'd like it to be parsed?

------
We are the carpenters and bricklayers of the Information Age.

Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

I shouldn't have to say this, but any code, unless otherwise stated, is untested

  • Comment on Re^2: Comparison of the parsing features of CSV (and xSV) modules

Replies are listed 'Best First'.
Re^3: Comparison of the parsing features of CSV (and xSV) modules
by Wally Hartshorn (Hermit) on Jun 15, 2004 at 21:14 UTC

    Here's an example:

    "Smith","John",12/31/1962,"Author of "How to Break Programs" and other books","Bugger"

    I'm using a series of (somewhat fragile) regexes to change that to:

    "Smith","John",12/31/1962,"Author of ""How to Break Programs"" and other books","Bugger"

    Wally Hartshorn

      There are, of course, going to be boundary cases that don't work as expected as soon as you start playing with allowing undoubled double-quotes inside of a format that expects them doubled. However Text::xSV allows you to define arbitrary filters that it preprocesses text with, and should do a reasonable job on the above with the following filter:
      sub { my $line = shift; $line =~ s/\r$//; $line =~ s/"(.)/""$1/g; $line =~ s/"?,"?/,/g; return $line; }
      Yes, there is some fragility, but it should be at least moderately hard to trigger.
      And, what should the parser do with the following:
      "Smith","John",12/31/1962,"Author of "How to Break Programs" and other + books,"Bugger" "Smith","John",12/31/1962,Author of "How to Break Programs" and other +books,"Bugger" "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books,"Bugger" "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books',"Bugger"

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      I shouldn't have to say this, but any code, unless otherwise stated, is untested

        And, what should the parser do with the following:
        "Smith","John",12/31/1962,"Author of "How to Break Programs" and other + books,"Bugger" "Smith","John",12/31/1962,"Author of ""How to Break Programs"" and oth +er books,"Bugger"
        "Smith","John",12/31/1962,Author of "How to Break Programs" and other +books,"Bugger" "Smith","John",12/31/1962,Author of ""How to Break Programs"" and othe +r books,"Bugger"
        "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books,"Bugger" (Reject?)
        "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books',"Bugger" (Reject?)

        (I haven't encountered any improperly quoted data, just data that doesn't escape embedded delimiters.)

        Wally Hartshorn