in reply to Re^2: extract relevent lines according to array
in thread extract relevent lines according to array

One technique is to "remember" where you are by calling a subroutine. So below, I read data lines, when $chrom is seen, the sub is called to extract the values. The sub "knows" that we are at the right place and have found $chrom simply because it is executing. Then the appopriate values are extracted (loop could be different if we take advantage of the sorted order of the input data). The main "while" loop will quit when we either run out of DATA or the first record is found (something in @values).

update: tested with DATA ending in EOF rather than yet another variableStep record and got a undefined $line error, so changed while in get_values() to while ( defined(my $line=<DATA>)  ) like in main loop.

#!/usr/bin/perl -w use strict; my $chrom='chr1'; my $start=9839; my $stop=9841; my @values; while ( defined(my $line=<DATA>) and !@values) { @values = get_values() if ( $line =~ m/\=$chrom$/); } sub get_values { my @values; while ( defined(my $line=<DATA>) ) { last unless $line =~ m/^\d/; my ($tag,$value) = split(/\s+/,$line); push (@values, $value) if ($tag >= $start and $tag <= $stop); } return @values } print "@values"; #prints: 0.007 0.004 0.002 __DATA__ variableStep chrom=chr1 9837 0.010 9838 0.008 9839 0.007 9840 0.004 9841 0.002 9842 0.001 variableStep chrom=chr2 9837 0.090 9838 0.038 9839 0.017 9840 0.044 9841 0.052 9842 0.091
This code is also possible as Perl has a tricky .. and ... operator! See Flipin good, or a total flop? for good discussion.
my @values; while (<DATA>) { if ( (/\=$chrom$/.../^v/) =~ m/^\d+(?<!^1)$/ ) #skip /start/ and /e +nd/ { my ($tag,$value)=split; push (@values, $value) if ($tag >= $start and $tag <= $stop); } else {last if @values} #optional } print "@values"; #prints: 0.007 0.004 0.002

Replies are listed 'Best First'.
Re^4: extract relevent lines according to array
by coldy (Scribe) on Apr 20, 2010 at 02:50 UTC
    Thanks for this. Now if instead of only testing one set of $chrom,$start,$Stop I have an array of these values where Id like to test each one.

    Im having a problem with exiting "while ( defined(my $line=<DATA>) and !@values)" I try to clear values but that wont work

    #!/usr/bin/perl -w use strict; my @triples = ("chr1 9837 9840", "chr1 99998 99999", "chr2 9838 9840") +; my($start,$chrom,$stop); foreach my $triple (@triples){ print "$triple :"; ($chrom,$start,$stop)=split(/\s+/,$triple); my @values=(); while ( defined(my $line=<DATA>) and !@values) { @values = get_values() if ( $line =~ m/\=$chrom$/); if(@values) { print "average ", average(\@values), "\n"; }else {print "not found: average NA \n";} } } sub get_values { my @values; while ( defined(my $line=<DATA>) ) { last unless $line =~ m/^\d/; my ($tag,$value) = split(/\s+/,$line); push (@values, $value) if ($tag >= $start and $tag <= $stop); } return @values } sub average { my ($array_ref) = @_; my $sum; my $count = scalar @$array_ref; foreach (@$array_ref) { $sum += $_; } return $sum / $count; } __DATA__ variableStep chrom=chr1 9837 0.010 9838 0.008 9839 0.007 9840 0.004 9841 0.002 9842 0.001 variableStep chrom=chr2 9837 0.090 9838 0.038 9839 0.017 9840 0.044 9841 0.052 9842 0.091
      As you have seen, the first code doesn't allow multiple queries to the DATA easily (would have to perhaps re-read the file or in some cases, deal with the fact that the line that ends an input record is the same thing that starts another input record).

      If the data can fit into memory, then I like to to that way rather than deal with these things. There of course many ways to do this, here is one:

      #!/usr/bin/perl -w use strict; use Data::Dumper; my %data; #a hash of array my $chrom; while ( defined(my $line =<DATA>) ) { chomp ($line); if ($line =~ /chrom=(\w+)$/) {$chrom = $1; next;} push ( @{$data{$chrom}}, $line); } my @triples = ("chr1 9837 9840", #same as your @triples "chr1 99998 99999", #just different spacing "chr2 9838 9840"); #print Dumper \%data; # uncomment this line and see what it does # a very powerful tool foreach (@triples) { my ($chrom, $start, $stop) = split; my @values = get_values(\%data, $chrom, $start, $stop); if (!@values) { print "No values for $chrom tags found between ". "$start and $stop inclusive\n"; } else { print "mean for $chrom tags {$start..$stop} is ", average(\@values),"\n"; print " values were: @values\n"; } } sub get_values { my ($HoA_ref, $chrom, $start, $stop) = @_; my @result; foreach my $number_string (@{$HoA_ref->{$chrom}}) { my ($tag, $value) = split(/\s+/,$number_string); push (@result, $value) if ($tag >= $start and $tag <= $stop); } return @result; } sub average #your average (mean) routine # { my ($array_ref) = @_; my $sum; my $count = scalar @$array_ref; foreach (@$array_ref) { $sum += $_; } return $sum / $count; } =prints mean for chr1 tags {9837..9840} is 0.00725 values were: 0.010 0.008 0.007 0.004 No values for chr1 tags found between 99998 and 99999 inclusive mean for chr2 tags {9838..9840} is 0.033 values were: 0.038 0.017 0.044 =cut __DATA__ variableStep chrom=chr1 9837 0.010 9838 0.008 9839 0.007 9840 0.004 9841 0.002 9842 0.001 variableStep chrom=chr2 9837 0.090 9838 0.038 9839 0.017 9840 0.044 9841 0.052 9842 0.091