comment on

As you have seen, the first code doesn't allow multiple queries to the DATA easily (would have to perhaps re-read the file or in some cases, deal with the fact that the line that ends an input record is the same thing that starts another input record).

If the data can fit into memory, then I like to to that way rather than deal with these things. There of course many ways to do this, here is one:

#!/usr/bin/perl -w
use strict;
use Data::Dumper;

my %data;  #a hash of array
my $chrom;

while ( defined(my $line =<DATA>) )
{
    chomp ($line);
    if ($line =~ /chrom=(\w+)$/) {$chrom = $1; next;}
    push ( @{$data{$chrom}}, $line);      
}

my @triples = ("chr1 9837   9840",    #same as your @triples
               "chr1 99998 99999",    #just different spacing
               "chr2 9838   9840");
               
#print Dumper \%data;    # uncomment this line and see what it does
                         # a very powerful tool
               
foreach (@triples)
{
    my ($chrom, $start, $stop) = split;
    my @values = get_values(\%data, $chrom, $start, $stop);
    if (!@values)
    {
        print "No values for $chrom tags found between ".
              "$start and $stop inclusive\n";
    }
    else
    {
        print "mean for $chrom tags {$start..$stop} is ",
               average(\@values),"\n";
        print "  values were: @values\n";
    }
}

sub get_values
{
   my ($HoA_ref, $chrom, $start, $stop) = @_;
   my @result;
   foreach my $number_string (@{$HoA_ref->{$chrom}})
   {
       my ($tag, $value) = split(/\s+/,$number_string);
       push (@result, $value) if ($tag >= $start and $tag <= $stop);
   }
   return @result;
}

sub average   #your average (mean) routine #
{
    my ($array_ref) = @_;
    my $sum;
    my $count = scalar @$array_ref;
    foreach (@$array_ref) { $sum += $_; }
    return $sum / $count;
} 

=prints
mean for chr1 tags {9837..9840} is 0.00725
  values were: 0.010 0.008 0.007 0.004
No values for chr1 tags found between 99998 and 99999 inclusive
mean for chr2 tags {9838..9840} is 0.033
  values were: 0.038 0.017 0.044
=cut

__DATA__
variableStep chrom=chr1
9837    0.010
9838    0.008
9839    0.007
9840    0.004
9841    0.002
9842    0.001
variableStep chrom=chr2
9837    0.090
9838    0.038
9839    0.017
9840    0.044
9841    0.052
9842    0.091
[download]

In reply to Re^5: extract relevent lines according to array by Marshall
in thread extract relevent lines according to array by coldy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.