vikee has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I have problems with parsing data, can anybody help me: this is the data structure:

COLUMN_A: "Y","N" COLUMN_B: COLUMN_C: "something", "something else", "and more" ...
I need to separate them to a two dimensional array of strings like:
@array[0][0] = "COLUMN_A"; @array[0][1] = "Y"; @array[0][2] = "N"; @array[1][0] = "COLUMN_B" @array[2][0] = "COLUMN_C" @array[2][1] = "something" @array[3][2] = "something else" @array[3][3] = "and more"
the problem is, that i need to get rid of the colon, the commas, and the "".

the line begins with the column name, before can be whitespaces what i don't need. then the parameters separated by commas, i don't need the "" too. so as in the example above. there can be from 0 to undefined params.

I'm a beginner, so help me, please.

Edited by Chady -- removed <pre> tags, fixed formatting.

Replies are listed 'Best First'.
Re: data parsing
by davorg (Chancellor) on Jul 21, 2004 at 14:49 UTC

    Use Text::ParseWords.

    #!/usr/bin/perl use Text::ParseWords; use Data::Dumper; my @data; while (<DATA>) { chomp; s/^\s+//; s/[\s:,]+\s*$//; push @data, [parse_line '[\s:,]+', 0, $_]; } print Dumper \@data; __DATA__ COLUMN_A: "Y","N" COLUMN_B: COLUMN_C: "something", "something else", "and more"
    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: data parsing
by jZed (Prior) on Jul 21, 2004 at 16:13 UTC
    Can you have data like this?

    COLUMN_C: "something","something,with a comma", "other stuff"

    Or like this?

    COLUMN_C: "something","something\"with a quote", "other stuff"

    If so, I recommend that you use a CSV parsing module.

      Text::ParseWords handles both of those possibilities without any problems - and it's a standard part of the Perl distribution.

      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

Re: data parsing
by pbeckingham (Parson) on Jul 21, 2004 at 14:58 UTC

    How about this:

    #! /usr/bin/perl -w use strict; use Data::Dumper; my %data; while (<DATA>) { chomp; my ($column, $value) = /^\s*(\S+):\s*(.*?)$/; my @fields = $value =~ /"([^"]*")/g; push @{$data{$column}}, @fields; } print Dumper (\%data); __DATA__ COLUMN_A: "Y","N" COLUMN_B: COLUMN_C: "something", "something else", "and more"

Re: data parsing
by murugu (Curate) on Jul 22, 2004 at 04:05 UTC

    Just My try,

    Note: This below code does not work if the words contain comma's inbetween.

    use strict; my @array; while (<DATA>) { chomp; s,",,g; push @array , [ split/[:,]\s?/]; } __DATA__ COLUMN_A: "Y","N" COLUMN_B: COLUMN_C: "something", "something else", "and more"

    Murugesan


    s,,,s,y,y,,,s,,yyymssusyyyyyryyssuysyyysygyeypsyaynyss,s,,,y,y,,d,y,s,,d,y,p,s,d&&print
Re: data parsing
by xorl (Deacon) on Jul 21, 2004 at 15:29 UTC
    The above answers are fine, but the below just seems more logical to me, and it doesn't use any weird modules:
    #!/usr/bin/perl my $i=0; #line number my $j=0; #field number my @array; while (<DATA>) { chomp; my @colon = split(":"); $j=0; $array[$i][$j] = $colon[0]; my @commas = split(",", $colon[1]); foreach my $field (@commas) { $j++; $array[$i][$j]=$field; } $i++; } ## Just for printing my $a=0; my $b=0; foreach my $fields (@array) { $b=0; foreach my $field (@$fields) { print "FIELD ($a,$b) = " . $field . "\n"; $b++; } $a++; } ## End printing section __DATA__ COLUMN_A: "Y","N" COLUMN_B: COLUMN_C: "something", "something else", "and more"

      This breaks if the data items contain embedded commas. That's why I used Text::ParseWords (which isn't a weird module, it comes as standard with Perl).

      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg