The trick is to use variables to memorize where you are in your processing. Either use a variable that records your "status" as a number (that is called a [no such wiki, state machine]) or use one or more variables that record different conditions (commonly called "flags"). Below is your script adapted to use a flag called $foundchrom:
my $foundchrom=0;
while(my$line=<DATA>){
if ($line=~m/^variableStep/) {
if ($line=~m/$chrom/){
$foundchrom=1;
}
else {
$foundchrom=0;
}
#above if-then-else could be written shorter as
#$foundchrom= ($line=~m/$chrom/);
next;
}
my ($s,$prob)=split(/\s+/,$line);
if ($s<$start or $s>$stop or not $foundchrom) {
next;
}
else{push @values,$prob;}
}
}
| [reply] [d/l] |
One technique is to "remember" where you are by calling a subroutine. So below, I read data lines, when $chrom is seen, the sub is called to extract the values. The sub "knows" that we are at the right place and have found $chrom simply because it is executing. Then the appopriate values are extracted (loop could be different if we take advantage of the sorted order of the input data). The main "while" loop will quit when we either run out of DATA or the first record is found (something in @values).
update: tested with DATA ending in EOF rather than yet another variableStep record and got a undefined $line error, so changed while in get_values() to while ( defined(my $line=<DATA>) ) like in main loop.
#!/usr/bin/perl -w
use strict;
my $chrom='chr1';
my $start=9839;
my $stop=9841;
my @values;
while ( defined(my $line=<DATA>) and !@values)
{
@values = get_values() if ( $line =~ m/\=$chrom$/);
}
sub get_values
{
my @values;
while ( defined(my $line=<DATA>) )
{
last unless $line =~ m/^\d/;
my ($tag,$value) = split(/\s+/,$line);
push (@values, $value) if ($tag >= $start and $tag <= $stop);
}
return @values
}
print "@values"; #prints: 0.007 0.004 0.002
__DATA__
variableStep chrom=chr1
9837 0.010
9838 0.008
9839 0.007
9840 0.004
9841 0.002
9842 0.001
variableStep chrom=chr2
9837 0.090
9838 0.038
9839 0.017
9840 0.044
9841 0.052
9842 0.091
This code is also possible as Perl has a tricky .. and ... operator! See Flipin good, or a total flop? for good discussion.
my @values;
while (<DATA>)
{
if ( (/\=$chrom$/.../^v/) =~ m/^\d+(?<!^1)$/ ) #skip /start/ and /e
+nd/
{
my ($tag,$value)=split;
push (@values, $value) if ($tag >= $start and $tag <= $stop);
}
else {last if @values} #optional
}
print "@values"; #prints: 0.007 0.004 0.002
| [reply] [d/l] [select] |
#!/usr/bin/perl -w
use strict;
my @triples = ("chr1 9837 9840", "chr1 99998 99999", "chr2 9838 9840")
+;
my($start,$chrom,$stop);
foreach my $triple (@triples){
print "$triple :";
($chrom,$start,$stop)=split(/\s+/,$triple);
my @values=();
while ( defined(my $line=<DATA>) and !@values)
{
@values = get_values() if ( $line =~ m/\=$chrom$/);
if(@values)
{
print "average ", average(\@values), "\n";
}else {print "not found: average NA \n";}
}
}
sub get_values
{
my @values;
while ( defined(my $line=<DATA>) )
{
last unless $line =~ m/^\d/;
my ($tag,$value) = split(/\s+/,$line);
push (@values, $value) if ($tag >= $start and $tag <= $stop);
}
return @values
}
sub average
{
my ($array_ref) = @_;
my $sum;
my $count = scalar @$array_ref;
foreach (@$array_ref) { $sum += $_; }
return $sum / $count;
}
__DATA__
variableStep chrom=chr1
9837 0.010
9838 0.008
9839 0.007
9840 0.004
9841 0.002
9842 0.001
variableStep chrom=chr2
9837 0.090
9838 0.038
9839 0.017
9840 0.044
9841 0.052
9842 0.091
| [reply] [d/l] |
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my %data; #a hash of array
my $chrom;
while ( defined(my $line =<DATA>) )
{
chomp ($line);
if ($line =~ /chrom=(\w+)$/) {$chrom = $1; next;}
push ( @{$data{$chrom}}, $line);
}
my @triples = ("chr1 9837 9840", #same as your @triples
"chr1 99998 99999", #just different spacing
"chr2 9838 9840");
#print Dumper \%data; # uncomment this line and see what it does
# a very powerful tool
foreach (@triples)
{
my ($chrom, $start, $stop) = split;
my @values = get_values(\%data, $chrom, $start, $stop);
if (!@values)
{
print "No values for $chrom tags found between ".
"$start and $stop inclusive\n";
}
else
{
print "mean for $chrom tags {$start..$stop} is ",
average(\@values),"\n";
print " values were: @values\n";
}
}
sub get_values
{
my ($HoA_ref, $chrom, $start, $stop) = @_;
my @result;
foreach my $number_string (@{$HoA_ref->{$chrom}})
{
my ($tag, $value) = split(/\s+/,$number_string);
push (@result, $value) if ($tag >= $start and $tag <= $stop);
}
return @result;
}
sub average #your average (mean) routine #
{
my ($array_ref) = @_;
my $sum;
my $count = scalar @$array_ref;
foreach (@$array_ref) { $sum += $_; }
return $sum / $count;
}
=prints
mean for chr1 tags {9837..9840} is 0.00725
values were: 0.010 0.008 0.007 0.004
No values for chr1 tags found between 99998 and 99999 inclusive
mean for chr2 tags {9838..9840} is 0.033
values were: 0.038 0.017 0.044
=cut
__DATA__
variableStep chrom=chr1
9837 0.010
9838 0.008
9839 0.007
9840 0.004
9841 0.002
9842 0.001
variableStep chrom=chr2
9837 0.090
9838 0.038
9839 0.017
9840 0.044
9841 0.052
9842 0.091
| [reply] [d/l] |