Re^6: Longest Increasing Subset

Replies are listed 'Best First'.
Re^7: Longest Increasing Subset by pryrt (Abbot) on Oct 24, 2016 at 14:49 UTC
wow... so I implemented the wiki page's pseudocode, and benchmarked my code vs the Longest Increasing Subsequence: #!/usr/bin/env perl use strict; use warnings; use POSIX qw(ceil); sub lis(@) { # lis = Longest Increasing Subsequence: algorithm from https://en. +wikipedia.org/wiki/Longest_increasing_subsequence, referenced # by BrowserUK at https://perlmonks.org/?node_id=1174579 my @X = @_; my (@P, @M); $#P = $#X; $#M = $#X + 1; my $L = 0; my $N = @X; for my $i ( 0 .. $N-1 ) { # binary search for j <= L # such that X[ M[j] ] < X[ i ] my $lo = 1; my $hi = $L; while( $lo <= $hi) { my $mid = ceil(($lo+$hi)/2.); if( $X[ $M[$mid] ] < $X[$i] ) { $lo = $mid + 1; } else { $hi = $mid - 1; } } # after searching, lo is 1 greater than the # length of the longest prefix of X[i] my $newL = $lo; # the predecessor of X[i] is the last index # of the subsequence of length newL-1 $P[$i] = $M[$newL -1 ]; $M[$newL] = $i; $L = $newL if $newL > $L; # if we found a subseq longer than + any prev, update L } my @S; $#S = $L - 1; my $k = $M[$L]; for my $j (0 .. $L-1) { my $i = ($L-1)-$j; $S[$i] = $X[$k]; $k = $P[$k]; } return wantarray ? @S : [@S]; } use Algorithm::Combinatorics qw/combinations/; sub pryrt(@) { # my algorithm, from https://perlmonks.org/?node_id=1174328 my @array = @_; my $Fcount = @array; my @a; while( $Fcount ) { my $iter = combinations( \@array, $Fcount ); while( my $p = $iter->next() ) { my $monotonic = 1; @a = @$p; foreach my $i ( 0.. $#a-1 ) { my $j = $i + 1; $monotonic &&= $a[$j] > $a[$i]; last unless $monotonic; } return @a if $monotonic; } --$Fcount; } return wantarray ? @a : [@a]; } sub genarray { my $len = shift \|\| rand 100; my $maxr = shift \|\| rand 999; return map { int rand $maxr } 1 .. $len; } local $\ = "\n"; local $, = " "; my @array = genarray(20, 999); print scalar @array, "[@array]"; print "lis: [", lis(@array), "]"; print "pryrt: [", pryrt(@array), "]"; use Benchmark qw(cmpthese timethese); my $rv = timethese(-10, { lis => sub { lis @array; }, pryrt => sub { pryrt @array; }, }); cmpthese $rv; [download] results: `20 [953 317 563 582 789 629 613 918 9 680 276 163 119 915 967 426 820 +227 256 650] lis: [ 317 563 582 613 680 915 967 ] pryrt: [ 317 563 582 629 680 915 967 ] Benchmark: running lis, pryrt for at least 10 CPU seconds ... lis: 10 wallclock secs (10.51 usr + 0.00 sys = 10.51 CPU) @ 25 +210.27/s (n=265086) pryrt: 11 wallclock secs (11.26 usr + 0.00 sys = 11.26 CPU) @ + 0.18/s (n=2) (warning: too few iterations for a reliable count) s/iter pryrt lis pryrt 5.63 -- -100% lis 3.97e-005 14197064% --` [download] With a random list just twenty elements long, it went from 5-6s per run for my code, to better than 25_000 runs per second with the efficient algorithm: that's almost 150_000 times faster, and that's just with 20 elements in the array. (My original benchmark tried cmpthese() on random length arrays, using the defaults to my genarray() function, but after a minute, I didn't feel like waiting that long, so switched to a 10sec timethese()/cmpthese() pair. Multiple runs give similar results.) When run-time is money, it pays to search for known-good algorithms, rather than just trusting that you've thought of an efficient algorithm! Thanks ++gilthoniel for an interesting thread, and ++BrowserUK, for once again bringing his knowledge of efficient coding practices to help us all. To all who come upon this thread: definitely use the efficient code, rather than my original code!	[reply] [d/l] [select]
Re^8: Longest Increasing Subset by BrowserUk (Patriarch) on Oct 24, 2016 at 16:06 UTC
I also implemented the wiki pseudo-code, resulting almost identical code: #! perl -slw use strict; use Time::HiRes qw[ time ]; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 500; our $S //= 0; srand $S if $S; our $N //= 20; sub LIS { my $X = shift; my( @P, @M ); my $L = 0; for my $i ( 0 .. $#$X ) { ## Binary search for the largest positive j = L such that X[M[ +j]] < X[i] my $lo = 1; my $hi = $L; while( $lo <= $hi ) { my $mid = int( ( $lo + $hi ) / 2 ); if( $X->[ $M[ $mid ] ] < $X->[ $i ] ) { $lo = $mid + 1; } else{ $hi = $mid - 1; } } ## After searching, lo is 1 greater than the length of the lon +gest prefix of X[i] my $newL = $lo; ## The predecessor of X[i] is the last index of the subsequenc +e of length newL-1 $P[ $i ] = $M[ $newL - 1 ]; $M[ $newL ] = $i; if( $newL > $L ) { ## If we found a subsequence longer than any we've found y +et, update L $L = $newL; } } ## Reconstruct the longest increasing subsequence my @S; my $k = $M[ $L ]; for my $i ( reverse 0 .. $L-1 ) { $S[ $i ] = $X->[ $k ]; $k = $P[ $k ]; } return @S; } my @data = map int( rand $N ), 1 .. $N; my $start = time; my @best = LIS( \@data ); printf "best(%d): @best\n", scalar @best; printf "Took %.3f seconds\n", time() - $start; [download] Though I pass the data in via a reference for efficiency. I've been trying to wrap my brain around the mentioned Knuth optimisation without success. One comment on your code. This `return wantarray ? @S : [@S];` throws away the advantage of returning a reference, by constructing a list from the existing array, which is then used to construct another (anonymous) array, a reference to which is returned. Better to just do `return wantarray ? @S : \@S;` With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]