Renyulb28 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I would like to know where to start in going about solving this issue. I currently have a dataset with a column of start coordinates and a column of stop coordinates, and a column of quantitative values for each interval. The intervals are a sliding window that moves up for each row.So it looks something like this:

12345 12445 0.454 12346 12446 0.326 12347 12447 0.355 . . .

The other dataset are the specific coordinates of the important markers, and has a column of specific coordinates:

12435 13455 13532 16135

I would like to know how I can go about compiling all the intervals from the first dataset that each marker from the second dataset falls in, and averaging all of the quantitative values for those intervals, thus getting an average quantitative value for each marker. Some windows will encompass more that one marker, so all of those markers will receive the same average.

Thank you for any guidance.

Replies are listed 'Best First'.
Re: Sliding window intervals and average values for specific coordinates
by johngg (Canon) on Jan 12, 2012 at 22:07 UTC

    I'm not quite sure what you mean by your last sentence as I wouldn't have thought all the markers in one particular window will all be members of the same windows. Anyway, is this something like what you are trying to do?

    use strict; use warnings; use 5.010; use List::Util qw{ sum }; use Data::Dumper; open my $windowsFH, q{<}, \ <<EOD or die qq{open: < HEREDOC: $!\n}; 10 19 0.287 11 20 0.502 12 21 0.462 13 22 0.407 14 23 0.254 15 24 0.335 16 25 0.474 EOD my %values; while ( <$windowsFH> ) { my( $min, $max, $value ) = split; push @{ $values{ $_ } }, $value for $min .. $max; } close $windowsFH or die qq{close: < HEREDOC: $!\n}; print Data::Dumper ->Dumpxs( [ \ %values ], [ qw{ *values } ] ); open my $markersFH, q{<}, \ <<EOD or die qq{open: < HEREDOC: $!\n}; 12 15 17 EOD while ( <$markersFH> ) { chomp; say qq{$_ - }, ( sum @{ $values{ $_ } } ) / @{ $values{ $_ } }; } close $markersFH or die qq{close: < HEREDOC: $!\n};

    The output.

    I hope this is helpful.

    Cheers,

    JohnGG

Re: Sliding window intervals and average values for specific coordinates
by GrandFather (Saint) on Jan 12, 2012 at 22:09 UTC

    If the coordinates are integral and fewer than a few tens of million then you could simply generate a hash which uses coordinates as keys and averages as values then its a simple matter of looking up the hash for each marker:

    use strict; use warnings; use lib 'C:\Users\Peter\Tools\BuildManager'; my $data = <<DATA; 12345 12445 0.454 12346 12446 0.326 12347 12447 0.355 DATA my $markers = <<MARKERS; 12435 13455 13532 16135 MARKERS my %dataLU; open my $dataIn, '<', \$data; while (defined (my $line = <$dataIn>)) { chomp $line; my ($start, $end, $avg) = split ' ', $line; $dataLU{$_}{sum} += $avg for $start .. $end; ++$dataLU{$_}{count} for $start .. $end; } close $dataIn; open my $markersIn, '<', \$markers; while (defined (my $line = <$markersIn>)) { chomp $line; next if ! length $line; if (! exists $dataLU{$line}) { print "No value for $line\n"; next; } printf "Marker $line: %f\n", $dataLU{$line}{sum} / $dataLU{$line}{ +count}; } close $markersIn;

    Prints:

    Marker 12435: 0.378333 No value for 13455 No value for 13532 No value for 16135
    True laziness is hard work
Re: Sliding window intervals and average values for specific coordinates
by BrowserUk (Patriarch) on Jan 12, 2012 at 22:15 UTC

    Something like this?

    #! perl -slw use strict; use List::Util qw[ sum ]; sub ave{ (sum( @_ )||0) / (@_||1) } my @intervals = map[ split ], do{ local @ARGV = 'intervals.dat'; <> }; chomp @intervals; my @markers = do{ local @ARGV = 'markers.dat'; <> }; chomp @markers; my %groups; my $first = 0; for my $int ( @intervals ) { my( $start, $stop, $val ) = @{ $int }; ++$first while $first < $#markers and $markers[ $first ] < $start; my $next = $first; push @{ $groups{ $markers[ $next ] } }, $val while $next++ < $#markers and $markers[ $next ] < $stop; } for my $marker ( @markers ) { print "$marker ", ave( @{ $groups{ $marker } } ); } __END__ C:\test>947611 10150 0 10250 0 10350 0.772 10450 0.594666666666667 10550 0.60125 10650 0.60125 10750 0.5585 10850 0.585333333333333 ... 18150 0.37475 18250 0.571 18350 0.523166666666667 18450 0.499 18550 0.4958 18650 0.4958 18750 0.372 18850 0.333714285714286

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: Sliding window intervals and average values for specific coordinates
by locked_user sundialsvc4 (Abbot) on Jan 13, 2012 at 14:11 UTC

    The solution that first popped into my head involved the use of an SQL JOIN to take advantage of the one-to-many relationship between markers and windows.   Coding off the top of my head, something like:

    SELECT m.coordinate, AVERAGE(i.quantitativeValue) FROM markers m INNER JOIN intervals i ON (m.coordinate >= i.lowCoordinate) AND (m.coordinate <= i.highCoordinate) GROUP BY m.coordinate

    Looking over the SQL I’ve just written, I wonder if this is, indeed, the solution to your problem.   The more I look at it, the more that I think that it is.   This will equally accommodate there being just one intervals row for each marker, or more than one.