wow... so I implemented the wiki page's pseudocode, and benchmarked my code vs the Longest Increasing Subsequence:
#!/usr/bin/env perl
use strict;
use warnings;
use POSIX qw(ceil);
sub lis(@) {
# lis = Longest Increasing Subsequence: algorithm from https://en.
+wikipedia.org/wiki/Longest_increasing_subsequence, referenced
# by BrowserUK at https://perlmonks.org/?node_id=1174579
my @X = @_;
my (@P, @M);
$#P = $#X;
$#M = $#X + 1;
my $L = 0;
my $N = @X;
for my $i ( 0 .. $N-1 ) {
# binary search for j <= L
# such that X[ M[j] ] < X[ i ]
my $lo = 1;
my $hi = $L;
while( $lo <= $hi) {
my $mid = ceil(($lo+$hi)/2.);
if( $X[ $M[$mid] ] < $X[$i] ) {
$lo = $mid + 1;
} else {
$hi = $mid - 1;
}
}
# after searching, lo is 1 greater than the
# length of the longest prefix of X[i]
my $newL = $lo;
# the predecessor of X[i] is the last index
# of the subsequence of length newL-1
$P[$i] = $M[$newL -1 ];
$M[$newL] = $i;
$L = $newL if $newL > $L; # if we found a subseq longer than
+ any prev, update L
}
my @S; $#S = $L - 1;
my $k = $M[$L];
for my $j (0 .. $L-1) {
my $i = ($L-1)-$j;
$S[$i] = $X[$k];
$k = $P[$k];
}
return wantarray ? @S : [@S];
}
use Algorithm::Combinatorics qw/combinations/;
sub pryrt(@) {
# my algorithm, from https://perlmonks.org/?node_id=1174328
my @array = @_;
my $Fcount = @array;
my @a;
while( $Fcount ) {
my $iter = combinations( \@array, $Fcount );
while( my $p = $iter->next() ) {
my $monotonic = 1;
@a = @$p;
foreach my $i ( 0.. $#a-1 ) {
my $j = $i + 1;
$monotonic &&= $a[$j] > $a[$i];
last unless $monotonic;
}
return @a if $monotonic;
}
--$Fcount;
}
return wantarray ? @a : [@a];
}
sub genarray {
my $len = shift || rand 100;
my $maxr = shift || rand 999;
return map { int rand $maxr } 1 .. $len;
}
local $\ = "\n";
local $, = " ";
my @array = genarray(20, 999);
print scalar @array, "[@array]";
print "lis: [", lis(@array), "]";
print "pryrt: [", pryrt(@array), "]";
use Benchmark qw(cmpthese timethese);
my $rv = timethese(-10, {
lis => sub { lis @array; },
pryrt => sub { pryrt @array; },
});
cmpthese $rv;
results:
20 [953 317 563 582 789 629 613 918 9 680 276 163 119 915 967 426 820
+227 256 650]
lis: [ 317 563 582 613 680 915 967 ]
pryrt: [ 317 563 582 629 680 915 967 ]
Benchmark:
running
lis, pryrt
for at least 10 CPU seconds
...
lis: 10 wallclock secs (10.51 usr + 0.00 sys = 10.51 CPU) @ 25
+210.27/s (n=265086)
pryrt: 11 wallclock secs (11.26 usr + 0.00 sys = 11.26 CPU) @
+ 0.18/s (n=2)
(warning: too few iterations for a reliable count)
s/iter pryrt lis
pryrt 5.63 -- -100%
lis 3.97e-005 14197064% --
With a random list just twenty elements long, it went from 5-6s per run for my code, to better than 25_000 runs per second with the efficient algorithm: that's almost 150_000 times faster, and that's just with 20 elements in the array. (My original benchmark tried cmpthese() on random length arrays, using the defaults to my genarray() function, but after a minute, I didn't feel like waiting that long, so switched to a 10sec timethese()/cmpthese() pair. Multiple runs give similar results.)
When run-time is money, it pays to search for known-good algorithms, rather than just trusting that you've thought of an efficient algorithm!
Thanks ++gilthoniel for an interesting thread, and ++BrowserUK, for once again bringing his knowledge of efficient coding practices to help us all.
To all who come upon this thread: definitely use the efficient code, rather than my original code! | [reply] [d/l] [select] |
I also implemented the wiki pseudo-code, resulting almost identical code: #! perl -slw
use strict;
use Time::HiRes qw[ time ];
use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 500;
our $S //= 0; srand $S if $S;
our $N //= 20;
sub LIS {
my $X = shift;
my( @P, @M );
my $L = 0;
for my $i ( 0 .. $#$X ) {
## Binary search for the largest positive j = L such that X[M[
+j]] < X[i]
my $lo = 1;
my $hi = $L;
while( $lo <= $hi ) {
my $mid = int( ( $lo + $hi ) / 2 );
if( $X->[ $M[ $mid ] ] < $X->[ $i ] ) {
$lo = $mid + 1;
}
else{
$hi = $mid - 1;
}
}
## After searching, lo is 1 greater than the length of the lon
+gest prefix of X[i]
my $newL = $lo;
## The predecessor of X[i] is the last index of the subsequenc
+e of length newL-1
$P[ $i ] = $M[ $newL - 1 ];
$M[ $newL ] = $i;
if( $newL > $L ) {
## If we found a subsequence longer than any we've found y
+et, update L
$L = $newL;
}
}
## Reconstruct the longest increasing subsequence
my @S;
my $k = $M[ $L ];
for my $i ( reverse 0 .. $L-1 ) {
$S[ $i ] = $X->[ $k ];
$k = $P[ $k ];
}
return @S;
}
my @data = map int( rand $N ), 1 .. $N;
my $start = time;
my @best = LIS( \@data );
printf "best(%d): @best\n", scalar @best;
printf "Took %.3f seconds\n", time() - $start;
Though I pass the data in via a reference for efficiency.
I've been trying to wrap my brain around the mentioned Knuth optimisation without success.
One comment on your code. This return wantarray ? @S : [@S]; throws away the advantage of returning a reference, by constructing a list from the existing array, which is then used to construct another (anonymous) array, a reference to which is returned.
Better to just do return wantarray ? @S : \@S;
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |