monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Hi
I want to parse the following text below (__DATA__). As you can see below the first consensus (AGAGAAAG 1) contain two pattern sets and second consensus (AGAAACAG 1) only contain one pattern sets. What I intend to do is to parse the following header of each set (as @temp):
CONSENSUS: AGAGAAAG 1, Horz OCC: 7 ZSCORE: 0.142138988325903 Number of Submotifs: 3 Vertical Occurence: 10 Score: 7.66666666666667
and copy it into every of its instance(s) of the set that it belongs to.
See the desired answers below. How come my code here doesn't do the job?
#!/usr/bin/perl -w use strict; use Data::Dumper; use Carp; my %bighash; my $org = 'mus09r'; my @all; my @temp; my @temp2; my @ins_grp; my ( $cons, $hor_oc ); while ( my $line = <DATA> ) { chomp $line; if ( $line =~ /CONSENSUS: \s ([ATCGSN]+ \s \d+), \s Horz \s OCC: \s ( +\d+)/xms ) { @temp = (); ( $cons, $hor_oc ) = ( $1, $2 ); push @temp, ( $cons, $hor_oc ); } elsif ( $line =~ /ZSCORE (?:.*) (\d+\.\d+|NA)$/xms ) { my $zscore = $1; push @temp, $zscore; } elsif ( $line =~ /^Number (?:.*) (\d+)/xms ) { my $nb_sb = $1; push @temp, $nb_sb; } elsif ( $line =~ /^Vertical (?:.*) (\d+)/xms ) { my $vert_occ = $1; push @temp, $vert_occ; } elsif ( $line =~ /Score: \s (.*)$/xms ) { my $score = $1; push @ins_grp, $score; } elsif ( $line =~ /\d+,-\d+,[ATCG]+$/xms ) { push @ins_grp, $line; } elsif ( $line =~ /^$/xms ) { push @temp2, [(@temp,@ins_grp)] if (@ins_grp); @ins_grp = (); } elsif ( $line =~ /\^+/xms ) { push @all, [@temp2]; } } $bighash{$org} = [@all]; print Dumper \%bighash ; __DATA__ Total Consensus: 448 CONSENSUS: AGAGAAAG 1, Horz OCC: 7 ZSCORE: 0.142138988325903 Number of Submotifs: 3 Vertical Occurence: 10 Score: 7.66666666666667 PATTERN: 3 B -2 A 1,-424,CAGAGACAGGGGAGAGATAG 1,-338,AAGAGAAAGGGAGGAGAGGC 1,-349,AAGAGGGGAGGAAGAGAAAG Score: 7.66666666666667 PATTERN: 3 C -3 F -4 E 1,-337,AGAGAAAGGGAGGAGAGGCA 1,-348,AGAGGGGAGGAAGAGAAAGG 1,-423,AGAGACAGGGGAGAGATAGA ^^^^^^^^^^^^^^ CONSENSUS: AGAAACAG 1, Horz OCC: 7 ZSCORE: 1.36112386682747 Number of Submotifs: 4 Vertical Occurence: 4 Score: 7.66666666666667 PATTERN: 3 C -2 F 1,-383,TGAGAAACAG 1,-319,CAAGAAACAG 0,-457,CTTGAAACAG ^^^^^^^^^^^^^^


Desired answer:
$VAR1 = { 'mus09r' => [ [ 'AGAGAAAG 1', '7', '0.142138988325903', '3', '0', [ '7.66666666666667', '1,-337,AGAGAAAGGGAGGAGAGGCA', '1,-348,AGAGGGGAGGAAGAGAAAGG', '1,-423,AGAGACAGGGGAGAGATAGA' ] ], [ 'AGAGAAAG 1', '7', '0.142138988325903', '3', '0', [ '7.66666666666667', '1,-424,CAGAGACAGGGGAGAGATAG', '1,-338,AAGAGAAAGGGAGGAGAGGC', '1,-349,AAGAGGGGAGGAAGAGAAAG' ], ], [ 'AGAAACAG 1', '7', '1.36112386682747', '4', '4', [ '7.66666666666667', '1,-383,TGAGAAACAG', '1,-319,CAAGAAACAG', '0,-457,CTTGAAACAG' ] ] ] };

Regards,
Edward

Replies are listed 'Best First'.
Re: Problem of Duplicating an Array while Parsing Text
by GrandFather (Saint) on Dec 20, 2005 at 04:08 UTC

    You want to push an AoA:

    elsif ( $line =~ /^$/xms ) { push @temp2, [(@temp,[@ins_grp])] if (@ins_grp); @ins_grp = (); }

    DWIM is Perl's answer to Gödel
      Thanks GrandFather, but it produce this instead: Which is still not quite right.

      Regards,
      Edward

        You have an extra level introduced in the push @all and you need to flush @temp2:

        elsif ( $line =~ /\^+/xms ) { push @all, @temp2; @temp2 = (); }

        With both sets of changes applied to your original code it now prints:

        $VAR1 = { 'mus09r' => [ [ 'AGAGAAAG 1', '7', '0.142138988325903', '3', '0', [ '7.66666666666667', '1,-424,CAGAGACAGGGGAGAGATAG', '1,-338,AAGAGAAAGGGAGGAGAGGC', '1,-349,AAGAGGGGAGGAAGAGAAAG' ] ], [ 'AGAGAAAG 1', '7', '0.142138988325903', '3', '0', [ '7.66666666666667', '1,-337,AGAGAAAGGGAGGAGAGGCA', '1,-348,AGAGGGGAGGAAGAGAAAGG', '1,-423,AGAGACAGGGGAGAGATAGA' ] ], [ 'AGAAACAG 1', '7', '1.36112386682747', '4', '4', [ '7.66666666666667', '1,-383,TGAGAAACAG', '1,-319,CAAGAAACAG', '0,-457,CTTGAAACAG' ] ] ] };

        DWIM is Perl's answer to Gödel
        I think your desired input doesn't match your desired output. Recheck your inputs, because you seem to have got them switched. Good luck.
Re: Problem of Duplicating an Array while Parsing Text
by QM (Parson) on Dec 20, 2005 at 17:05 UTC
    While this won't fix your logic problems, your parsing might be simpler (and easier to maintain and debug) if you used something like Parse::RecDescent.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of