a11 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I've got a huge two-dimensional array a[x][y].
Think of x as a coordinate and y as a value at it.

At some ranges of x, there are stretches of zero-values of y, which we can call gaps, such as:

x y
1 2
2 3
3 3
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 3
12 0
13 0
14 4
The task is to 'bridge' those 'gaps' that are 'shorter' than a certain interval (xmax) and have equal y values at the 'boundaries' (y0) - that is, for all x's within this interval, set y=y0.

For example, the gap at x={3;11} should be bridged with y=y0=3 if a user-defined xmax>7.
The gap at x={1;14} should never be bridged, as the y values at its boundaries are different.

It is easy to think of a pretty-much 'language-independent' algorithm for this problem, based on two nested loops.

But my array is really huge, so I am wondering if there is a more efficient Perl-specific solution to it using Perl-specific array functions?

Sorry if it looks a bit confusing and thanks in advance for your help!
  • Comment on A Perl-specific solution for a gap bridging problem?

Replies are listed 'Best First'.
Re: A Perl-specific solution for a gap bridging problem?
by Ieronim (Friar) on Jul 08, 2006 at 13:27 UTC
    The only Perl-specific advice I can give is that two-dimentional array is quite bad datastructure for such purpose. If you need to process pairs of numbers, the best data structure is array of arrays containing these pairs, e.g.
    @ary = ( [1,2], [2,3], [3,3] #other data )
    I wrote a simple code to solve your problem using this datastructure; it may be useful for you.
    #!/usr/bin/perl use warnings; use strict; my @ary= map { chomp; [split] } split /\n/, <<NUMS; #sample input dat +a 1 2 2 3 3 3 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 3 12 0 13 0 14 4 NUMS my @result; my @buffer; my $xmax = $ARGV[0] || 4; #get the xmax value from the command line my $lasty; foreach my $pair (@ary) { if ($pair->[1] != 0) { if (@buffer >= $xmax) { push @result, @buffer; @buffer = (); } if (@buffer && $lasty != $pair->[1]) { push @result, @buffer; @buffer = (); } if (@buffer && $lasty == $pair->[1]) { push @result, map { [$_->[0], $lasty] } @buffer; @buffer = (); } $lasty = $pair->[1]; push @result, $pair; next; } push @buffer, $pair; } print join "\n", map { "@$_" } @result; print "\n";
      Thanks a lot, Ieronim!
Re: A Perl-specific solution for a gap bridging problem?
by GrandFather (Saint) on Jul 08, 2006 at 21:51 UTC

    using an array is most likly faster and should require less memory for storage than a hash unless the number of gaps comprise a large part of the data. If that is the case you may wish to consider using a hash:

    use warnings; use strict; use constant kMaxGap => 7; my %gappyData; /^(\d+)\s+(\d+)/ and $2 != 0 and $gappyData{$1} = $2 while <DATA>; my $lastx; my $lasty; for (sort {$a <=> $b} keys %gappyData) { next if ! defined $lastx; next if ! defined $lasty; my $gap = $_ - $lastx - 1; next if $gap == 1; next if $gap > kMaxGap; next if $gappyData{$_} != $lasty; $gappyData{$_} = $lasty for $lastx .. $_; } continue { $lastx = $_; $lasty = $gappyData{$_}; } for (1 .. $lastx) { if (defined $gappyData{$_}) { print "$_, $gappyData{$_}\n"; } else { print "$_, -\n"; } } __DATA__ 1 2 2 3 3 3 4 0 5 0 6 0 7 0 8 0 9 0 10 0 11 3 12 0 13 0 14 4

    Prints:

    1, 2 2, 3 3, 3 4, 3 5, 3 6, 3 7, 3 8, 3 9, 3 10, 3 11, 3 12, - 13, - 14, 4

    Alternatively you can use a similar technique using missing values represented by undef in an array:

    use warnings; use strict; use constant kMaxGap => 7; my @gappyData; /^(\d+)\s+(\d+)/ and $2 != 0 and $gappyData[$1] = $2 while <DATA>; my $lastx; my $lasty; my $currx = 1; for (@gappyData[1..$#gappyData]) { next if ! defined $lastx; next if ! defined $lasty; my $gap = $lastx - $currx - 1; next if $gap == 1; next if $gap > kMaxGap; next if ! defined $gappyData[$currx] or $gappyData[$currx] != $las +ty; $_ = $lasty for @gappyData[$lastx .. $currx]; } continue { $lasty = $_, $lastx = $currx if defined $_; ++$currx; } $currx = 1; for (@gappyData[1..$#gappyData]) { if (defined $_) { print "$currx, $_\n"; } else { print "$currx, -\n"; } ++$currx; }

    which generates the same output given the same data.


    DWIM is Perl's answer to Gödel