in reply to Re: Stuck in my final step of code using array of arrays
in thread Stuck in my final step of code using array of arrays

Thank you so much! I think this will be really helpful for me!
  • Comment on Re^2: Stuck in my final step of code using array of arrays

Replies are listed 'Best First'.
Re^3: Stuck in my final step of code using array of arrays
by Anonymous Monk on Mar 04, 2014 at 01:47 UTC
    I tried the code and it works perfect!
    One last question though:
    Suppose you have this list:
    HIT:PF12951 SEQ_START:120 SEQ_END:350 HIT:PF03797 SEQ_START:822 SEQ_END:1073 HIT:PF15789 SEQ_START:1515 SEQ_END:1547 HIT:PF00267 SEQ_START:1200 SEQ_END:1350

    where there are codes not only before but also after the wanted one (PF03797). In this case the desired range would be between 351-1199 (350 is the end of the previous element and 1200 is the start of the next element).
    How can I take both of them? I tried the following without success
    use strict; use warnings; my %special = (PF03797 => 1); { local $/ = "//\n"; while (<DATA>) { my ($id) = /^ID:(\w+)/; my @data; while (/HIT:(\w+).*?SEQ_START:(\d+).*?(\d+)/g) { push @data, [ $1, $2, $3 ]; } @data = sort { $a->[2] <=> $b->[2] } @data; for my $i (0 .. $#data) { my $start; my $end; #print $data[$i][0]."\n"; if ($special{$data[$i][0]}) { print $data[$i][2]."\n"; if($start=$i) { $start = $data[$i - 1][2] - 1; } else { $start = $data[$i][1] - 1; } if($end=$i) { $end = $data[$i][2] - 1; } else { $end = $data[$i + 1][1] - 1; } print join "\t" => $id, $data[$i][0], $start, $end; } } } } print "\n"; __DATA__ ID:A0AWZ5 HIT:PF12951 SEQ_START:120 SEQ_END:350 HIT:PF03797 SEQ_START:822 SEQ_END:1073 HIT:PF15789 SEQ_START:1515 SEQ_END:1547 HIT:PF00267 SEQ_START:1200 SEQ_END:1350 //
      With a change to kcott's solution, this could be achieved. Your code shows you don't understand the terniary operator something ? 'if true' : 'if false'. The 2 lines with your error are:

      if($start=$i)

      and

      if($end=$i)

      Note how I wrote the line for my $start and my $end.

      my $start = $i == 0 ? $data[$i - 1][2] + 1 : 'none'; my $end = $i == $#data ? 'none' : $data[$i+1][1]-1;

      You could rewrite your code to be:

      if ($special{$data[$i][0]}) { if($i == 0) { $start = 'none'; } else { $start = $data[$i - 1][2] - 1; } if($i == $#data) { $end = 'none'; } else { $end = $data[$i + 1][1] - 1; } print join "\t" => $id, $data[$i][0], $start, $end; }
      The solution would be:
      #!/usr/bin/env perl -l use strict; use warnings; my %special = (PF03797 => 1); { local $/ = "//\n"; while (<DATA>) { my ($id) = /^ID:(\w+)/; my @data; while (/HIT:(\w+).*?SEQ_START:(\d+).*?(\d+)/g) { push @data, [ $1, $2, $3 ]; } @data = sort { $a->[2] <=> $b->[2] } @data; for my $i (0 .. $#data) { if ($special{$data[$i][0]}) { my $start = $i == 0 ? 'none' : $data[$i - 1][2] + 1; my $end = $i == $#data ? 'none' : $data[$i+1][1]-1; print join "\t" => $id, $data[$i][0], $start, $end; } } } } __DATA__ ID:A0AWZ5 HIT:PF12951 SEQ_START:120 SEQ_END:350 HIT:PF03797 SEQ_START:822 SEQ_END:1073 HIT:PF15789 SEQ_START:1515 SEQ_END:1547 HIT:PF00267 SEQ_START:1200 SEQ_END:1350 //
      The output from my code was:

      A0AWZ5  PF03797 351     1199

      Hope this helps.

      I think ++Cristoforo has provided the appropriate changes required (in Re^4: Stuck in my final step of code using array of arrays).

      Now that I see another example of input and expected output, I suspect 'none' is incorrect for either the start or the end of the range. I originally used this (in Re: Stuck in my final step of code using array of arrays) based on your description containing "... before and after them (if any) ..." in the OP.

      Here's another script, that uses virtually the same changes as Cristoforo supplied, but replaces 'none' with the values I think you want. I've included additional test data to cover the four cases with and without codes before and after the special code.

      #!/usr/bin/env perl use strict; use warnings; my %special = (PF03797 => 1); { local $/ = "//\n"; while (<DATA>) { my ($id) = /^ID:(\w+)/; my @data; while (/HIT:(\w+).*?SEQ_START:(\d+).*?(\d+)/g) { push @data, [ $1, $2, $3 ]; } @data = sort { $a->[2] <=> $b->[2] } @data; for my $i (0 .. $#data) { if ($special{$data[$i][0]}) { my $start = $i == 0 ? $data[$i][1] : $data[$i - 1 +][2] + 1; my $end = $i == $#data ? $data[$i][2] : $data[$i + 1 +][1] - 1; printf "%-41s %7s %4d %4d\n" => $id, $data[$i][0], $st +art, $end; } } } } __DATA__ ID:A0AWZ5_1___codes_before_only HIT:PF12951 SCORE:40.0 EVALUE:2.2e-10 HMM_START:2 HMM_END:32 SEQ_ST +ART:421 SEQ_END:455 HIT:PF03797 SCORE:130.7 EVALUE:3.6e-40 HMM_START:7 HMM_END:261 SEQ_ST +ART:822 SEQ_END:1073 HIT:PF12951 SCORE:38.7 EVALUE:5.5e-10 HMM_START:1 HMM_END:32 SEQ_ST +ART:515 SEQ_END:547 // ID:A0AWZ5_2___codes_before_and_after HIT:PF12951 SEQ_START:120 SEQ_END:350 HIT:PF03797 SEQ_START:822 SEQ_END:1073 HIT:PF15789 SEQ_START:1515 SEQ_END:1547 HIT:PF00267 SEQ_START:1200 SEQ_END:1350 // ID:A0AWZ5_3___codes_after_only HIT:PF03797 SEQ_START:822 SEQ_END:1073 HIT:PF15789 SEQ_START:1515 SEQ_END:1547 HIT:PF00267 SEQ_START:1200 SEQ_END:1350 // ID:A0AWZ5_4___codes_neither_before_nor_after HIT:PF03797 SEQ_START:822 SEQ_END:1073 //

      Output:

      A0AWZ5_1___codes_before_only PF03797 548 1073 A0AWZ5_2___codes_before_and_after PF03797 351 1199 A0AWZ5_3___codes_after_only PF03797 822 1199 A0AWZ5_4___codes_neither_before_nor_after PF03797 822 1073

      -- Ken

        Many thanks to both of you, greatly appreciated!!