This will produce blocks where consecutive values are within range, but not necessarily where all values are within range - for example, the second block in the output ranges from 1 to 2 for a required maximum deviation of 0.5.

To produce the desired result, you need to keep track of the maximum and minimum values within the candidate set, and test against these.

Also, if you drop the whole candidate set as soon as the latest value doesn't fit, you will miss sequences which start in the middle of the current set but don't include the early values - you need to add the latest value to the end and then chop values off the beginning until you get back to a qualifying set. This may end up being just the latest value, but it could be larger.

The following code shows one way to do this. It uses the same changed block size and deviation as tangent's example, and adds some more data to illustrate overlapping qualifying sets

use strict; use warnings; # Set block size and deviation my $block_size = 4; my $deviation = 0.5; # Initialise the candidates, maximum and minimum values my $candidates = [[split(' ', <DATA>)]]; my $maxval = my $minval = $candidates->[0][3]; # Loop through the rest of the DATA while (<DATA>) { my $current = [split]; if (within_range($current->[3], $maxval, $minval)) { push(@$candidates, $current); $maxval = $current->[3] if $maxval < $current->[3]; $minval = $current->[3] if $minval > $current->[3]; } elsif (@$candidates >= $block_size) { print_block($candidates); push(@$candidates, $current); ($candidates, $maxval, $minval) = trim_candidates($candidates); } else { push(@$candidates, $current); ($candidates, $maxval, $minval) = trim_candidates($candidates); } } # deal with stragglers if ( @$candidates >= $block_size ) { print_block( $candidates ); } sub within_range { my ($testval, $testmax, $testmin) = @_; return 0 if $testmax - $testval > $deviation; return 0 if $testval - $testmin > $deviation; return 1; } sub print_block { my ( $lines ) = @_; print "BLOCK\n"; for my $line ( @$lines ) { print join(' ', @$line ), "\n"; } } sub trim_candidates { my $worklist = shift; # drop the first entry shift @$worklist; # Check if the remaining worklist qualifies my $workmax = my $workmin = $worklist->[0][3]; foreach my $item (@$worklist) { return trim_candidates($worklist) unless within_range($item->[3], $workmax, $workmin); $workmax = $item->[3] if $workmax < $item->[3]; $workmin = $item->[3] if $workmin > $item->[3]; } return ($worklist, $workmax, $workmin); } 0; __DATA__ 1 10492 rs55998931 0.272727272727273 0.4375 1 13418 . 0.25 0.0625 1 13752 . 0.153846153846154 0.25 1 13813 . 0.0357142857142857 0.2 1 13838 . 0.0357142857142857 0.2 1 14907 rs79585140 0.5 0.555555555555556 1 14930 rs75454623 0.535714285714286 0.611111111111111 1 14933 rs199856693 0.0357142857142857 0.0555555555555556 1 14948 rs201855936 0.107142857142857 0 1 10492 rs55998931 1 0.4375 1 10492 rs55998931 1.5 0.4375 1 10492 rs55998931 1.9 0.4375 1 10492 rs55998931 2 0.4375 1 10492 rs55998931 2.6 0.4375 1 13418 blah 20.0 blah 1 13418 blah 20.1 blah 1 13418 blah 20.2 blah 1 13418 blah 20.3 blah 1 13418 blah 20.4 blah 1 13418 blah 20.5 blah 1 13418 blah 20.6 blah 1 13418 blah 20.7 blah 1 13418 blah 30.5 blah 1 13418 blah 30.0 blah 1 13418 blah 30.0 blah 1 13418 blah 30.5 blah 1 13418 blah 30.6 blah 1 13418 blah 30.9 blah 1 13418 blah 30.6 blah 1 13418 blah 30.9 blah

Output:

BLOCK 1 10492 rs55998931 0.272727272727273 0.4375 1 13418 . 0.25 0.0625 1 13752 . 0.153846153846154 0.25 1 13813 . 0.0357142857142857 0.2 1 13838 . 0.0357142857142857 0.2 1 14907 rs79585140 0.5 0.555555555555556 BLOCK 1 13418 blah 20.0 blah 1 13418 blah 20.1 blah 1 13418 blah 20.2 blah 1 13418 blah 20.3 blah 1 13418 blah 20.4 blah 1 13418 blah 20.5 blah BLOCK 1 13418 blah 20.1 blah 1 13418 blah 20.2 blah 1 13418 blah 20.3 blah 1 13418 blah 20.4 blah 1 13418 blah 20.5 blah 1 13418 blah 20.6 blah BLOCK 1 13418 blah 20.2 blah 1 13418 blah 20.3 blah 1 13418 blah 20.4 blah 1 13418 blah 20.5 blah 1 13418 blah 20.6 blah 1 13418 blah 20.7 blah BLOCK 1 13418 blah 30.5 blah 1 13418 blah 30.0 blah 1 13418 blah 30.0 blah 1 13418 blah 30.5 blah BLOCK 1 13418 blah 30.5 blah 1 13418 blah 30.6 blah 1 13418 blah 30.9 blah 1 13418 blah 30.6 blah 1 13418 blah 30.9 blah

In reply to Re^2: extract values from a field that are consecutive and within one standard deviation of each other by Myrddin Wyllt
in thread extract values from a field that are consecutive and within one standard deviation of each other by mulder4786

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.