kepler has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have a string of the type:
ci,14938340,2,"Monday, February 21, 2011 19:58:06 UTC",34.6953,-118.53 +50,2.2,17.40, 9,"Southern California"
from where I wish to extract several fields; as you can see they are separated by commas - but some have commas inside them; so split is outside the question... I'm trying this:
@data1 = ($line =~ m/ *([a-z]+) *, *(\d+) *, *(\d+) *, *\"(.*?)\" *, * +([\d\.]*) *, *([\d\.]*) *, *([\d\.]*) *, *([\d\.] *) *, *([\d\.]*) *, + *\"(.*?)\" */gi); $d = $4; $g = $7; $j = $10; $h = $8; $e = $5; $f = $6;
also tryed
my ($a,$b,$c,$d,$e,$f,$g,$h,$i,$j) = @data1;
But I get null strings in the first case, and a full one - in $a - in the second. Can someone help me out? Kind regards, Kepler

Replies are listed 'Best First'.
Re: Parsing a string
by CountZero (Bishop) on Feb 22, 2011 at 16:34 UTC
    Text::CSV, don't even think of anything else.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      I'd think of Text::CSV_XS :) At best, Text::CSV is a needless intermediary.

      (Upd: To clarify, using Text::CSV might be wiser if you're distributing a non-XS module or application, but for your own code, it can only introduce problems. )

        One reason could be no support for blank_is_undef/ empty_is_undef :)
        My bad tilly, I have never used your Text::xSV before, but if the opportunity presents itself, I will give it a spin.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Parsing a string
by BrowserUk (Patriarch) on Feb 22, 2011 at 16:41 UTC

    Like this?

    $s = q[ci,14938340,2,"Monday, February 21, 2011 19:58:06 UTC",34.6953, +-118.5350,2.2,17.40, 9,"Southern California"];; print for $s =~ m[("[^"]+"|[^,]+)(?:,|$)]g;; ci 14938340 2 "Monday, February 21, 2011 19:58:06 UTC" 34.6953 -118.5350 2.2 17.40 9 "Southern California"

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Really nice :) Thanks - it's a wonderul solution and its module independent; I don't have the mentioned module in my webserver (do you believe that???) Kind regards, Kepler

        Be aware that the given regular expression has a chance to be broken in some circumstances.

        Example:

        $s = q[ci,14938340,2,"Monday, February 21, 2011 19:58:06 UTC",34.6953, +-118.5350,2.2,17.40, 9,"Southern California, \"US\""]; print for $s =~ m[("[^"]+"|[^,]+)(?:,|$)]g;

        Output

        ci 14938340 2 "Monday, February 21, 2011 19:58:06 UTC" 34.6953 -118.5350 2.2 17.40 9 "Southern California \"US\""
        --
        Regards
        - Samar

      Yeah this was a really clean solution. Very nicely done.

      I was going to suggest something more complicated like the below for subtracting uniques. But you would have had to do it for each data type.

      # CODE for finding a number field of a certain length if (my @matches = $datainstring =~ m{ ([0-9]{12}) }xmsg) { print qq{matched @matches};push(@match2, @matches);foreach my $elem + ( @match2 ) {next if $seen{ $elem }++;push @unique, $elem;}### GET U +NIQUES ##### } # IF #
Re: Parsing a string
by kennethk (Abbot) on Feb 22, 2011 at 16:46 UTC
    The easiest answer to your issue to suggest the use of one of many CSV modules on CPAN (Comma-separated values). My preference is for Text::CSV. A sample which does what you request:

    #!/usr/bin/perl use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new. or die "Cannot use CSV: ".Text::CSV->error_diag (); my @rows; open my $fh, "<&", *DATA or die "Clone failed"; while ( my $row = $csv->getline( $fh ) ) { push @rows, $row; } $csv->eof or $csv->error_diag(); print join "\n\n", map {join "\n", @$_} @rows; __DATA__ ci,14938340,2,"Monday, February 21, 2011 19:58:06 UTC",34.6953,-118.53 +50,2.2,17.40, 9,"Southern California"
Re: Parsing a string
by johna (Monk) on Feb 22, 2011 at 18:38 UTC
    I think Text::ParseWords (core module) should work as well:
    #!/usr/bin/perl use strict; use warnings; use Text::ParseWords; my $s = q[ci,14938340,2,"Monday, February 21, 2011 19:58:06 UTC",34.69 +53,-118.5350,2.2,17.40, 9,"Southern California"]; print join "\n", parse_line(",", 1, $s); print "\n";
    Outputs:
    ci 14938340 2 "Monday, February 21, 2011 19:58:06 UTC" 34.6953 -118.5350 2.2 17.40 9 "Southern California"
    -John
Re: Parsing a string
by Monkomatic (Sexton) on Feb 22, 2011 at 18:39 UTC

    To BrowserUK.

    Do you have Something similar that would fix comma's for people who like to stick comma's in their address?

    Input: Mark Williams 6/246 400 Albock road , Apt 2 West, Junction, 6 Alton, NM 60555 Output: Mark Williams 6/246 400 Albock road Apt 2 West Junction 6 Alton NM 60555
      s/,//g; s/\s+/ /g;