Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^4: Comparison of the parsing features of CSV (and xSV) modules

by dragonchild (Archbishop)
on Jun 15, 2004 at 23:17 UTC ( [id://367080]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Comparison of the parsing features of CSV (and xSV) modules
in thread Comparison of the parsing features of CSV (and xSV) modules

And, what should the parser do with the following:
"Smith","John",12/31/1962,"Author of "How to Break Programs" and other + books,"Bugger" "Smith","John",12/31/1962,Author of "How to Break Programs" and other +books,"Bugger" "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books,"Bugger" "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books',"Bugger"

------
We are the carpenters and bricklayers of the Information Age.

Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

I shouldn't have to say this, but any code, unless otherwise stated, is untested

Replies are listed 'Best First'.
Re^5: Comparison of the parsing features of CSV (and xSV) modules
by Wally Hartshorn (Hermit) on Jun 16, 2004 at 16:13 UTC
    And, what should the parser do with the following:
    "Smith","John",12/31/1962,"Author of "How to Break Programs" and other + books,"Bugger" "Smith","John",12/31/1962,"Author of ""How to Break Programs"" and oth +er books,"Bugger"
    "Smith","John",12/31/1962,Author of "How to Break Programs" and other +books,"Bugger" "Smith","John",12/31/1962,Author of ""How to Break Programs"" and othe +r books,"Bugger"
    "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books,"Bugger" (Reject?)
    "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books',"Bugger" (Reject?)

    (I haven't encountered any improperly quoted data, just data that doesn't escape embedded delimiters.)

    Wally Hartshorn

      What about the following:
      abcd,"efgh,"ijkl,"mnop",qrst
      Is that malformed or is that meant to be
      abcd,"efgh,""ijkl,""mnop",qrst

      The issue is that there are too many edge cases for a general-purpose parser to handle. I'm coming up with a bunch and I'm not even trying hard.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      I shouldn't have to say this, but any code, unless otherwise stated, is untested

        Ah! Now I know what you're asking. Well, my point wasn't that CSV handlers need to be able to handle every possible bit of crud that is thrown at them. I was just saying that it would be useful if they would handle unescaped embedded delimiters -- perhaps not absolutely dirty data, but at least somewhat dusty data. :-)

        Wally Hartshorn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://367080]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (2)
As of 2024-04-24 15:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found