in reply to extract relevent lines according to array

So, what code have you written and where and how does it fail for you?

  • Comment on Re: extract relevent lines according to array

Replies are listed 'Best First'.
Re^2: extract relevent lines according to array
by coldy (Scribe) on Apr 19, 2010 at 10:53 UTC
    I have written this
    while(my$line=<DATA>){ if ($line=~m/$chrom/){ #not sure how to move to the next few lines of <DATA> +where I can test that $s<$start my ($s,$prob)=split(/\s+/,$line); if ($s<$start) { next; } else{push @values,$prob;} } }
    Basically I dont know how to loop through the lines after the line that matches the $chrom condition so I can test if those lines where $start > $s

      The trick is to use variables to memorize where you are in your processing. Either use a variable that records your "status" as a number (that is called a [no such wiki, state machine]) or use one or more variables that record different conditions (commonly called "flags"). Below is your script adapted to use a flag called $foundchrom:

      my $foundchrom=0; while(my$line=<DATA>){ if ($line=~m/^variableStep/) { if ($line=~m/$chrom/){ $foundchrom=1; } else { $foundchrom=0; } #above if-then-else could be written shorter as #$foundchrom= ($line=~m/$chrom/); next; } my ($s,$prob)=split(/\s+/,$line); if ($s<$start or $s>$stop or not $foundchrom) { next; } else{push @values,$prob;} } }
      One technique is to "remember" where you are by calling a subroutine. So below, I read data lines, when $chrom is seen, the sub is called to extract the values. The sub "knows" that we are at the right place and have found $chrom simply because it is executing. Then the appopriate values are extracted (loop could be different if we take advantage of the sorted order of the input data). The main "while" loop will quit when we either run out of DATA or the first record is found (something in @values).

      update: tested with DATA ending in EOF rather than yet another variableStep record and got a undefined $line error, so changed while in get_values() to while ( defined(my $line=<DATA>)  ) like in main loop.

      #!/usr/bin/perl -w use strict; my $chrom='chr1'; my $start=9839; my $stop=9841; my @values; while ( defined(my $line=<DATA>) and !@values) { @values = get_values() if ( $line =~ m/\=$chrom$/); } sub get_values { my @values; while ( defined(my $line=<DATA>) ) { last unless $line =~ m/^\d/; my ($tag,$value) = split(/\s+/,$line); push (@values, $value) if ($tag >= $start and $tag <= $stop); } return @values } print "@values"; #prints: 0.007 0.004 0.002 __DATA__ variableStep chrom=chr1 9837 0.010 9838 0.008 9839 0.007 9840 0.004 9841 0.002 9842 0.001 variableStep chrom=chr2 9837 0.090 9838 0.038 9839 0.017 9840 0.044 9841 0.052 9842 0.091
      This code is also possible as Perl has a tricky .. and ... operator! See Flipin good, or a total flop? for good discussion.
      my @values; while (<DATA>) { if ( (/\=$chrom$/.../^v/) =~ m/^\d+(?<!^1)$/ ) #skip /start/ and /e +nd/ { my ($tag,$value)=split; push (@values, $value) if ($tag >= $start and $tag <= $stop); } else {last if @values} #optional } print "@values"; #prints: 0.007 0.004 0.002
        Thanks for this. Now if instead of only testing one set of $chrom,$start,$Stop I have an array of these values where Id like to test each one.

        Im having a problem with exiting "while ( defined(my $line=<DATA>) and !@values)" I try to clear values but that wont work

        #!/usr/bin/perl -w use strict; my @triples = ("chr1 9837 9840", "chr1 99998 99999", "chr2 9838 9840") +; my($start,$chrom,$stop); foreach my $triple (@triples){ print "$triple :"; ($chrom,$start,$stop)=split(/\s+/,$triple); my @values=(); while ( defined(my $line=<DATA>) and !@values) { @values = get_values() if ( $line =~ m/\=$chrom$/); if(@values) { print "average ", average(\@values), "\n"; }else {print "not found: average NA \n";} } } sub get_values { my @values; while ( defined(my $line=<DATA>) ) { last unless $line =~ m/^\d/; my ($tag,$value) = split(/\s+/,$line); push (@values, $value) if ($tag >= $start and $tag <= $stop); } return @values } sub average { my ($array_ref) = @_; my $sum; my $count = scalar @$array_ref; foreach (@$array_ref) { $sum += $_; } return $sum / $count; } __DATA__ variableStep chrom=chr1 9837 0.010 9838 0.008 9839 0.007 9840 0.004 9841 0.002 9842 0.001 variableStep chrom=chr2 9837 0.090 9838 0.038 9839 0.017 9840 0.044 9841 0.052 9842 0.091