using an array is most likly faster and should require less memory for storage than a hash unless the number of gaps comprise a large part of the data. If that is the case you may wish to consider using a hash:
use warnings;
use strict;
use constant kMaxGap => 7;
my %gappyData;
/^(\d+)\s+(\d+)/ and $2 != 0 and $gappyData{$1} = $2 while <DATA>;
my $lastx;
my $lasty;
for (sort {$a <=> $b} keys %gappyData) {
next if ! defined $lastx;
next if ! defined $lasty;
my $gap = $_ - $lastx - 1;
next if $gap == 1;
next if $gap > kMaxGap;
next if $gappyData{$_} != $lasty;
$gappyData{$_} = $lasty for $lastx .. $_;
} continue {
$lastx = $_;
$lasty = $gappyData{$_};
}
for (1 .. $lastx) {
if (defined $gappyData{$_}) {
print "$_, $gappyData{$_}\n";
} else {
print "$_, -\n";
}
}
__DATA__
1 2
2 3
3 3
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 3
12 0
13 0
14 4
Prints:
1, 2
2, 3
3, 3
4, 3
5, 3
6, 3
7, 3
8, 3
9, 3
10, 3
11, 3
12, -
13, -
14, 4
Alternatively you can use a similar technique using missing values represented by undef in an array:
use warnings;
use strict;
use constant kMaxGap => 7;
my @gappyData;
/^(\d+)\s+(\d+)/ and $2 != 0 and $gappyData[$1] = $2 while <DATA>;
my $lastx;
my $lasty;
my $currx = 1;
for (@gappyData[1..$#gappyData]) {
next if ! defined $lastx;
next if ! defined $lasty;
my $gap = $lastx - $currx - 1;
next if $gap == 1;
next if $gap > kMaxGap;
next if ! defined $gappyData[$currx] or $gappyData[$currx] != $las
+ty;
$_ = $lasty for @gappyData[$lastx .. $currx];
} continue {
$lasty = $_, $lastx = $currx if defined $_;
++$currx;
}
$currx = 1;
for (@gappyData[1..$#gappyData]) {
if (defined $_) {
print "$currx, $_\n";
} else {
print "$currx, -\n";
}
++$currx;
}
which generates the same output given the same data.
DWIM is Perl's answer to Gödel
|