Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks,
I have a CSV to work with, but some of the entries are quoted text and contain commas.
split(/,/ , $line) chops these up, so I either need to glue the right parts back together (seemingly impossible) or not split on commas that are between quotes eg."random, tripe". Sadly, not every entry is quoted, so I can't break it up by the quotes either.
Any ideas on how to do this?

Replies are listed 'Best First'.
Re: Regex to do a smart split()
by halley (Prior) on Jan 30, 2004 at 15:22 UTC

    There's plenty of CSV-handling code written already, which deal with all of these issues (and more). CPAN is a library of freely available Perl modules which have been tested and documented for easy use in your projects. Browse search.cpan.org, for example, Text::CSV.

    --
    [ e d @ h a l l e y . c c ]

      Do note that Text::CSV has been sitting at a 0.01 release since 1997 and should be considered deprecated. Text::CSV_XS is a much better choice. It's faster and has only been sitting since 2001 (a little better, anyway).

      ----
      I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
      -- Schemer

      : () { :|:& };:

      Note: All code is untested, unless otherwise stated

        No, its the other way around. Text::CSV works for more data than Text::CSV_XS. I never looked into what the issue was but the XS version is clearly inferior to the plain-perl version because it works when the other version flat-out fails.
        and should be considered deprecated.
        Because it's old?
      Thanks for super-fast response. I'd completely forgotten about the existance of CPAN...
        I'd completely forgotten about the existance of CPAN... (sic)

        Yer not from around here, are ye? :p

        --
        Allolex

Re: Regex to do a smart split()
by gryphon (Abbot) on Jan 30, 2004 at 15:28 UTC

    Greetings Anonymous,

    I recommend you take a different approach by looking at DBD::CSV or Text:CSV, depending on your exact needs. I deal with a lot of database stuff, so the former feels more comfortable than the latter for me. However, the choice should be about how you want to use the data.

    This is sort of a "buy vs build" argument, only it's "free vs build" in this case. As an extreme example, you can write your own HTML parser, but why bother when others have done your work for you?

    gryphon
    code('Perl') || die;

Re: Regex to do a smart split()
by flounder99 (Friar) on Jan 30, 2004 at 19:23 UTC
    Text::CSV is nice but it doesn't work if the field is "partially" quoted.
    use strict; my $line = join ",", ( '"completely quoted"', 'not quoted', 'partially "quoted"', '"completely quoted with embedded, comma"', 'partially "quoted with embedded, comma"', 'quote \'and comma, inside " single\' quote', 'single quote "and comma, inside of \' double" quote', 'last' ); my $re = qr/((?:[^"',]|"[^"]*"|'[^']*')+)/; print(">$_<\n") for $line =~ /$re/g;
    outputs
    >"completely quoted"< >not quoted< >partially "quoted"< >"completely quoted with embedded, comma"< >partially "quoted with embedded, comma"< >quote 'and comma, inside " single' quote< >single quote "and comma, inside of ' double" quote< >last<
    it will choke on unbalanced quotes but it quick and dirty

    UPDATE

    it will also choke on empty fields like

    one,,three

    --

    flounder

Re: Regex to do a smart split()
by Anonymous Monk on Jan 30, 2004 at 20:45 UTC
    Text::ParseWords will handle this for simple cases.
    #!/usr/bin/perl use strict; use warnings; use Text::ParseWords; my $str = qq/ "first item", "comma inside, item", a, b, c, 'last item' + /; my @a = parse_line(',', 0, $str); print "$_\n" for @a;