Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Stuck in my final step of code using array of arrays

by Anonymous Monk
on Mar 02, 2014 at 16:02 UTC ( [id://1076717]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks!
I have a file in the following format:
ID:A0AWZ5 HIT:PF12951 SCORE:40.0 EVALUE:2.2e-10 HMM_START:2 HMM_END: +32 SEQ_START:421 SEQ_END:455 HIT:PF03797 SCORE:130.7 EVALUE:3.6e-40 HMM_START:7 HMM_END +:261 SEQ_START:822 SEQ_END:1073 HIT:PF12951 SCORE:38.7 EVALUE:5.5e-10 HMM_START:1 HMM_END: +32 SEQ_START:515 SEQ_END:547 //

I have generated an array of arrays from this file:
$/="//\n"; while(<>) { if($_=~/^ID:(.*)/m) { $id=$1; $seq=$hash_f{$id}; @AoA=(); while($_=~/^HIT:(.*?)\tSCORE:[\d\.]+\tEVALUE:[\d\.\-\w]+\tHMM_ +START:\d+\tHMM_END:\d+\tSEQ_START:(\d+)\tSEQ_END:(\d+)/mg) { $pfam_name=$1; $seq_start=$2; $seq_end=$3; push @AoA, [ $pfam_name, $seq_start, $seq_end ]; } } print $id."\n"; for $i ( 0 .. $#AoA ) { print "$i\t [ @{$AoA[$i]} ],\n"; } } $/="\n";

which prints:
A0AWZ5 0 [ PF12951 421 455 ], 1 [ PF03797 822 1073 ], 2 [ PF12951 515 547 ],

My question is the following:
Some of the codes (like PF03797 for example) are different than the rest for the scope of my program. What I need to do is, when I find such codes (I have them stored in a hash), then I would need to find the elements in the array of arrays that are before and after them (if any) and then print the range from the end of the preceding element until the start of the subsequent one.
For example, in the case of A0AWZ5 that I have listed in my example, I would need to print, since PF03797 is indeed one of these "important" codes, the range from 548 (the end of the preceding element in the array of arrays until the end (since there is no subsequent element in this case).

Replies are listed 'Best First'.
Re: Stuck in my final step of code using array of arrays
by kcott (Archbishop) on Mar 03, 2014 at 01:08 UTC

    I strongly recommend you:

    • use the strict and warnings pragmata in all your scripts
    • localise changes to special variables (such as $/)
    • use lexical variables (usually my) instead of relying on package (global) variables

    I think this should be fairly close to what you want (and shows examples of all the recommendations I made above):

    #!/usr/bin/env perl -l use strict; use warnings; my %special = (PF03797 => 1); { local $/ = "//\n"; while (<DATA>) { my ($id) = /^ID:(\w+)/; my @data; while (/HIT:(\w+).*?SEQ_START:(\d+).*?(\d+)/g) { push @data, [ $1, $2, $3 ]; } @data = sort { $a->[2] <=> $b->[2] } @data; for my $i (0 .. $#data) { if ($special{$data[$i][0]}) { my $start = $i ? $data[$i - 1][2] + 1 : 'none'; print join "\t" => $id, $data[$i][0], $start, $data[$i +][2]; } } } } __DATA__ ID:A0AWZ5 HIT:PF12951 SCORE:40.0 EVALUE:2.2e-10 HMM_START:2 HMM_END:32 SEQ_ST +ART:421 SEQ_END:455 HIT:PF03797 SCORE:130.7 EVALUE:3.6e-40 HMM_START:7 HMM_END:261 SEQ_ST +ART:822 SEQ_END:1073 HIT:PF12951 SCORE:38.7 EVALUE:5.5e-10 HMM_START:1 HMM_END:32 SEQ_ST +ART:515 SEQ_END:547 //

    Output:

    A0AWZ5 PF03797 548 1073

    -- Ken

      Thank you so much! I think this will be really helpful for me!
        I tried the code and it works perfect!
        One last question though:
        Suppose you have this list:
        HIT:PF12951 SEQ_START:120 SEQ_END:350 HIT:PF03797 SEQ_START:822 SEQ_END:1073 HIT:PF15789 SEQ_START:1515 SEQ_END:1547 HIT:PF00267 SEQ_START:1200 SEQ_END:1350

        where there are codes not only before but also after the wanted one (PF03797). In this case the desired range would be between 351-1199 (350 is the end of the previous element and 1200 is the start of the next element).
        How can I take both of them? I tried the following without success
        use strict; use warnings; my %special = (PF03797 => 1); { local $/ = "//\n"; while (<DATA>) { my ($id) = /^ID:(\w+)/; my @data; while (/HIT:(\w+).*?SEQ_START:(\d+).*?(\d+)/g) { push @data, [ $1, $2, $3 ]; } @data = sort { $a->[2] <=> $b->[2] } @data; for my $i (0 .. $#data) { my $start; my $end; #print $data[$i][0]."\n"; if ($special{$data[$i][0]}) { print $data[$i][2]."\n"; if($start=$i) { $start = $data[$i - 1][2] - 1; } else { $start = $data[$i][1] - 1; } if($end=$i) { $end = $data[$i][2] - 1; } else { $end = $data[$i + 1][1] - 1; } print join "\t" => $id, $data[$i][0], $start, $end; } } } } print "\n"; __DATA__ ID:A0AWZ5 HIT:PF12951 SEQ_START:120 SEQ_END:350 HIT:PF03797 SEQ_START:822 SEQ_END:1073 HIT:PF15789 SEQ_START:1515 SEQ_END:1547 HIT:PF00267 SEQ_START:1200 SEQ_END:1350 //
Re: Stuck in my final step of code using array of arrays
by Laurent_R (Canon) on Mar 02, 2014 at 20:03 UTC
    Sorting your array of arrays, based on your answers on previous questions, demonstrated under the perl debugger:
    DB<2> @c = ([ PF12951, 421, 455 ], [ PF03797, 822, 1073 ], [ PF12951 +, 515, 547 ]); DB<3> x @c 0 ARRAY(0x80359d28) 0 'PF12951' 1 421 2 455 1 ARRAY(0x803601b8) 0 'PF03797' 1 822 2 1073 2 ARRAY(0x803603b0) 0 'PF12951' 1 515 2 547 DB<4> @d = sort {$a->[1] <=> $b->[1]} @c DB<5> x @d 0 ARRAY(0x80359d28) 0 'PF12951' 1 421 2 455 1 ARRAY(0x803603b0) 0 'PF12951' 1 515 2 547 2 ARRAY(0x803601b8) 0 'PF03797' 1 822 2 1073 DB<6>
    Once the sub-arrays are sorted, it is quite easy to pick up the end of the previous element.
Re: Stuck in my final step of code using array of arrays
by Kenosis (Priest) on Mar 02, 2014 at 17:35 UTC

    You've shown what you script prints. Given your specs, can you additionally show what you'd like it to print?

      Didn't I mention it??? Hm, maybe it wasn't clear... :)
      I would like to print:
      A0AWZ5\tPF03797\t548\t1073
        I can guess where the 1073 value is coming from, but can't see where the 548 is coming from (except, possibly, that it is one more that the end of the next element).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1076717]
Approved by hdb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-03-28 19:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found