michellem has asked for the wisdom of the Perl Monks concerning the following question:

This seems to me like it probably has an obvious solution, but I'm not seeing it.

I've got comma delimited output from a database (Acess), and text fields output with quotes around them, and non-text fields (#s) don't. I need to split a line of this output into an array. Some of the text fields have imbedded commas. What's the best way to do this?

for example: 1,"Hugo, Inc.",4,"This is a test",34.5,"Marc, Cindy and Rob"

Needs to be:

1 Hugo, Inc. 4 This is a test 34.5 Marc, Cindy and Rob
Obviously, I can't split just by ',' and I can't split by "," either. Suggestions?

Replies are listed 'Best First'.
Re: split problems
by infoninja (Friar) on Apr 11, 2001 at 20:50 UTC
      That breaks on embedded newlines.

      For a solution that solves that, check out Text::xSV (which I intend to put on CPAN this weekend, should have done it ages ago).

Re: split problems
by traveler (Parson) on Apr 11, 2001 at 21:28 UTC
    Checkout the examples in Perl Cookbook Recipe 1.15. One example using Text::ParseWords:
    use Text::ParseWords; sub parse_csv { return quoteword(",",0, $_[0]); }
    The args are:
    • "," the separator
    • 0 the elements can be quoted
    • $_[0] the string to process (the arg to the sub>
    (Although I'd use single quotes around the comma as it is not likely to contain a variable.)

    traveler

Re: split problems
by Chady (Priest) on Apr 11, 2001 at 21:15 UTC

    if you still wanna do this manually... try to export your database as a text special-combination-of-characters-delimited file.. try some combination that cannot be found in a file... I tried this once using ::$:: for delimiter, and I'm sure your text won't include these in sequence

    hope this helps


    He who asks will be a fool for five minutes, but he who doesn't ask will remain a fool for life.
Re: split problems
by mirod (Canon) on Apr 11, 2001 at 21:55 UTC

    If you want to manipulate the data using the DBI (to do SQL queries on it for example) you can also use DBD::RAM or DBD::CSV. This has the added bonus that if one day you want to access the data straight from the DB all you will have to change is the connection instruction.

Re: split problems
by DeusVult (Scribe) on Apr 11, 2001 at 22:50 UTC

    Easy, split by " :)

    Ok, this all assumes we don't have embedded parentheses in the quoted fields (If you do, it gets more complicated). But if you have a variable, $row, let's say, with the following value:

    1,"Hugo,Inc.",4,"This is a test",34.5, 17.9, "Marc, Cindy and Rob"

    Now, if you split this variable on ", and store it in @splitArray, then you will get the following:

    $splitArray[0] = 1, $splitArray[1] = Hugo, Inc. $splitArray[2] = ,4, $splitArray[3] = This is a test $splitArray[4] = ,34.5,17.9, $splitArray[5] = Marc, Cindy and Rob

    Now, notice that all of the even subscripts contain the entries that were not in quotes, while the odd subscripts contain the quoted strings. Therefore...

    use strict; my $row = "1, \"Hugo, Inc.\", 4, \"This is a test\", 34.5, 17.9, \"Mar +c, Cindy and Rob\""; print "ROW: $row\n"; my @splitArray = split '\"', $row; my @answerArray; for ( my $i = 0 ; $i <= $#splitArray ; $i++ ) { my @tempArray; if ( $i % 2 == 0 ) # if $i is even { @tempArray = split ',', $splitArray[$i]; } else # if $i is odd { @tempArray = ( $splitArray[$i] ); } # Shove all of @tempArray into @answerArray while ( @tempArray ) { if ( $i % 2 == 0 ) { $tempArray[0] =~ s/ //g; } if ( $tempArray[0] =~ /\S/ ) { push @answerArray, shift @tempArray; } else { shift @tempArray; } } } foreach my $answer ( @answerArray ) { print "$answer\n"; }

    There is some extra little cleanup garbage in there which you might not want, but that has the basic idea. Hope it helps.

    Some people drink from the fountain of knowledge, others just gargle.

      AHA!

      I can't always depend on the odd/even thing, because I'm looking for somewhat of a universal way to do this. But, if I first split by ", into a temp array, then go through the entries one by one, if they don't have commas either at the beginning or end (or both), push them into the real array, and if they do, strip off beginning and/or ending comma, and then split by , pushing the results into the real array. That should do it. I'll try it.

      Thanks!!