in reply to Re^2: Stuck in my final step of code using array of arrays
in thread Stuck in my final step of code using array of arrays

I tried the code and it works perfect!
One last question though:
Suppose you have this list:
HIT:PF12951 SEQ_START:120 SEQ_END:350 HIT:PF03797 SEQ_START:822 SEQ_END:1073 HIT:PF15789 SEQ_START:1515 SEQ_END:1547 HIT:PF00267 SEQ_START:1200 SEQ_END:1350

where there are codes not only before but also after the wanted one (PF03797). In this case the desired range would be between 351-1199 (350 is the end of the previous element and 1200 is the start of the next element).
How can I take both of them? I tried the following without success
use strict; use warnings; my %special = (PF03797 => 1); { local $/ = "//\n"; while (<DATA>) { my ($id) = /^ID:(\w+)/; my @data; while (/HIT:(\w+).*?SEQ_START:(\d+).*?(\d+)/g) { push @data, [ $1, $2, $3 ]; } @data = sort { $a->[2] <=> $b->[2] } @data; for my $i (0 .. $#data) { my $start; my $end; #print $data[$i][0]."\n"; if ($special{$data[$i][0]}) { print $data[$i][2]."\n"; if($start=$i) { $start = $data[$i - 1][2] - 1; } else { $start = $data[$i][1] - 1; } if($end=$i) { $end = $data[$i][2] - 1; } else { $end = $data[$i + 1][1] - 1; } print join "\t" => $id, $data[$i][0], $start, $end; } } } } print "\n"; __DATA__ ID:A0AWZ5 HIT:PF12951 SEQ_START:120 SEQ_END:350 HIT:PF03797 SEQ_START:822 SEQ_END:1073 HIT:PF15789 SEQ_START:1515 SEQ_END:1547 HIT:PF00267 SEQ_START:1200 SEQ_END:1350 //

Replies are listed 'Best First'.
Re^4: Stuck in my final step of code using array of arrays
by Cristoforo (Curate) on Mar 04, 2014 at 02:53 UTC
    With a change to kcott's solution, this could be achieved. Your code shows you don't understand the terniary operator something ? 'if true' : 'if false'. The 2 lines with your error are:

    if($start=$i)

    and

    if($end=$i)

    Note how I wrote the line for my $start and my $end.

    my $start = $i == 0 ? $data[$i - 1][2] + 1 : 'none'; my $end = $i == $#data ? 'none' : $data[$i+1][1]-1;

    You could rewrite your code to be:

    if ($special{$data[$i][0]}) { if($i == 0) { $start = 'none'; } else { $start = $data[$i - 1][2] - 1; } if($i == $#data) { $end = 'none'; } else { $end = $data[$i + 1][1] - 1; } print join "\t" => $id, $data[$i][0], $start, $end; }
    The solution would be:
    #!/usr/bin/env perl -l use strict; use warnings; my %special = (PF03797 => 1); { local $/ = "//\n"; while (<DATA>) { my ($id) = /^ID:(\w+)/; my @data; while (/HIT:(\w+).*?SEQ_START:(\d+).*?(\d+)/g) { push @data, [ $1, $2, $3 ]; } @data = sort { $a->[2] <=> $b->[2] } @data; for my $i (0 .. $#data) { if ($special{$data[$i][0]}) { my $start = $i == 0 ? 'none' : $data[$i - 1][2] + 1; my $end = $i == $#data ? 'none' : $data[$i+1][1]-1; print join "\t" => $id, $data[$i][0], $start, $end; } } } } __DATA__ ID:A0AWZ5 HIT:PF12951 SEQ_START:120 SEQ_END:350 HIT:PF03797 SEQ_START:822 SEQ_END:1073 HIT:PF15789 SEQ_START:1515 SEQ_END:1547 HIT:PF00267 SEQ_START:1200 SEQ_END:1350 //
    The output from my code was:

    A0AWZ5  PF03797 351     1199

    Hope this helps.

Re^4: Stuck in my final step of code using array of arrays
by kcott (Archbishop) on Mar 04, 2014 at 04:06 UTC

    I think ++Cristoforo has provided the appropriate changes required (in Re^4: Stuck in my final step of code using array of arrays).

    Now that I see another example of input and expected output, I suspect 'none' is incorrect for either the start or the end of the range. I originally used this (in Re: Stuck in my final step of code using array of arrays) based on your description containing "... before and after them (if any) ..." in the OP.

    Here's another script, that uses virtually the same changes as Cristoforo supplied, but replaces 'none' with the values I think you want. I've included additional test data to cover the four cases with and without codes before and after the special code.

    #!/usr/bin/env perl use strict; use warnings; my %special = (PF03797 => 1); { local $/ = "//\n"; while (<DATA>) { my ($id) = /^ID:(\w+)/; my @data; while (/HIT:(\w+).*?SEQ_START:(\d+).*?(\d+)/g) { push @data, [ $1, $2, $3 ]; } @data = sort { $a->[2] <=> $b->[2] } @data; for my $i (0 .. $#data) { if ($special{$data[$i][0]}) { my $start = $i == 0 ? $data[$i][1] : $data[$i - 1 +][2] + 1; my $end = $i == $#data ? $data[$i][2] : $data[$i + 1 +][1] - 1; printf "%-41s %7s %4d %4d\n" => $id, $data[$i][0], $st +art, $end; } } } } __DATA__ ID:A0AWZ5_1___codes_before_only HIT:PF12951 SCORE:40.0 EVALUE:2.2e-10 HMM_START:2 HMM_END:32 SEQ_ST +ART:421 SEQ_END:455 HIT:PF03797 SCORE:130.7 EVALUE:3.6e-40 HMM_START:7 HMM_END:261 SEQ_ST +ART:822 SEQ_END:1073 HIT:PF12951 SCORE:38.7 EVALUE:5.5e-10 HMM_START:1 HMM_END:32 SEQ_ST +ART:515 SEQ_END:547 // ID:A0AWZ5_2___codes_before_and_after HIT:PF12951 SEQ_START:120 SEQ_END:350 HIT:PF03797 SEQ_START:822 SEQ_END:1073 HIT:PF15789 SEQ_START:1515 SEQ_END:1547 HIT:PF00267 SEQ_START:1200 SEQ_END:1350 // ID:A0AWZ5_3___codes_after_only HIT:PF03797 SEQ_START:822 SEQ_END:1073 HIT:PF15789 SEQ_START:1515 SEQ_END:1547 HIT:PF00267 SEQ_START:1200 SEQ_END:1350 // ID:A0AWZ5_4___codes_neither_before_nor_after HIT:PF03797 SEQ_START:822 SEQ_END:1073 //

    Output:

    A0AWZ5_1___codes_before_only PF03797 548 1073 A0AWZ5_2___codes_before_and_after PF03797 351 1199 A0AWZ5_3___codes_after_only PF03797 822 1199 A0AWZ5_4___codes_neither_before_nor_after PF03797 822 1073

    -- Ken

      Many thanks to both of you, greatly appreciated!!