ecuguru has asked for the wisdom of the Perl Monks concerning the following question:

I have a line of comma sep data from an excel output:
Name,1,2,3, which gives:
"Last,First",1,2,3...
The comma separating Last, First, is screwing with my split function.
Whenever a field has a comma inside it, excel will put quotes around the entire cell to indicate that it is indeed one record, and not two.

For parsing it, I'm planning to split the line via "" first, then split the other half of the line by comma. BUT this isn't a very flexible approach, and if other records in the string have "" I won't accomodate them at all.
Is there a way for split to ignore delimiters inside other delimiters, like ignore the comma inside a quote, or is there just a better way to do this?

Thanks,
Tim

Replies are listed 'Best First'.
Re: Split line with extra delim
by jdtoronto (Prior) on Aug 23, 2005 at 20:48 UTC
      There's always a lib. That'll do fine. Would be curious if people know a way to do this without the lib, but this does exactly what I need. Thanks!
        In nearly thirty five years of engineering I find that whilst wheels are wonderful things to invent that I could waste far too much time re-inventing all of the wheels I have had to deal with! I therefore subscribe to that wonderful programmers virtue - laziness - akin to the great engineering virtue of expediency.

        Sure you can do it yourself, and module source code is a great place to learn. The parse method in Text::CSV is simply elegant. But prior to starting to use the module back in 2000 I had maintained a bundle of my own code that kept running into problems - the module has never let me down.

        jdtoronto

Re: Split line with extra delim
by QM (Parson) on Aug 23, 2005 at 21:19 UTC
    jdtoronto has the best answer, but you appear to want more (or less ;).

    One might think splitting with a regex would work, like this:

    @x = split /"?,"?/, $chunk; # broken
    but that still splits the quoted field.

    Instead you can grab quoted fields and non-quoted fields, and process them separately. However, you quickly get into issues with escaped quotes, negative look-behinds, etc. It gets messy pretty quick, and difficult to maintain.

    It's essentially parsing a complex syntax, so let a parser do it :)

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: Split line with extra delim
by ambrus (Abbot) on Aug 24, 2005 at 08:47 UTC

    A simple way could be to convince Excel to use some other character (instead of a comma) for separating fields. A tab is probably a good choice.

      A tab is probably a good choice.
      That just makes tabs problematic inside double quotes. Granted, with enough searching and foreknowledge, one can pick a character that never occurs. But never is not as long as we think, and a proven solution is better than a statistical one.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of