in reply to Searching parallel arrays.

I'd just do a standard merge but throw away the results as you go unless the next output is 1+previous.

#!/usr/bin/perl -w use strict; my @list= ( # A more interesting test: [ 101, 105, 204, 312 ], [ 100, 313, 409 ], [ 205, 206, 211, 315 ], [ 106, 207, 210, 314 ], ); # Temporarily get list numbers sorted by first item my @next= sort { $list[$a][0] <=> $list[$b][0] } 0..$#list; # $p is the index of the list to pick from next # $next[$p] says which list to pick from after list $p ( my $p, @next[@next] )= ( @next, -1 ); my @run; # Our consecutive run so far my @from; # Which list we selected each of the above from my @best= ( undef ) x 2; # The run to beat while( 1 ) { # Add smallest remaining item to our run push @run, shift @{ $list[$p] }; push @from, $p; # If we've run out of items or if this one isn't consecutive if( ! defined( $run[-1] ) || 1 < @run && $run[-2]+1 < $run[-1] ) { # then see if the now-finished run is a winner pop @from; if( @best <= @from ) { # We've at least tied, so report if( @best < @from ) { # A new size reached! print "$#run-int runs:$/"; } @best= @run; pop @best; print "$best[0] .. $best[-1] ( @from )$/"; } last if ! defined $run[-1]; # We ran out of items # The next run starts with the item we just added @run= $run[-1]; @from= $p; } elsif( 1 < @run && $run[-1] <= $run[-2] ) { # The lists weren't sorted or our code is broken die "Abort; out of sequence: @run$/"; } # Move $list[$p] now that it has a bigger first item my $n= $next[$p]; if( ! @{ $list[$p] } ) { # Empty. Don't insert $list[$p]. $next[$p]= -2; # Not really needed $p= $n; } elsif( $n < 0 || $list[$p][0] < $list[$n][0] ) { # Do $list[$p] again } else { my $x= $n; my $y= $next[$x]; while( 0 <= $y && $list[$y][0] < $list[$p][0] ) { $y= $next[ $x= $y ]; } # Do list[$p] after $list[$x], before $list[$y]: $next[$x]= $p; $next[$p]= $y; $p= $n; } }

- tye        

Replies are listed 'Best First'.
Re^2: Searching parallel arrays. (just merge)
by BrowserUk (Patriarch) on Dec 09, 2006 at 09:48 UTC

    I don't quite follow how you track which array to read from next (yet), but godamn it's fast.

    The following times yours, Limbic~Region's and mine. The data is 1 .. $N (minus anything with a 9 to stop it being a single contiguous sequence), randomly distributed across 4 arrays. The only tweaks I've made are to accumulate the results into an array rather than printing them out.

    c:\test>588553-buk -N=100000 1 trial of Four arrays and a total of 100000 (1.756s total) Found 6561 runs c:\test>588553-lr -N=100000 1 trial of Four arrays and a total of 100000 (1.164s total) Found 6561 runs c:\test>588553-tye -N=100000 1 trial of Four arrays and a total of 100000 (507.209ms total) Found 6561 runs

    By way of comparison, I also tested one of the 'sort them all together' solutions (grinder's, with apologies to the other contributors!) against yours, though I had to drop $N to stop it from blowing the memory:

    c:\test>588553-gdr -N=10000 1 trial of Four arrays and a total of 10000 (4.491s total) Found 729 runs c:\test>588553-tye -N=10000 1 trial of Four arrays and a total of 10000 (55.570ms total) Found 729 runs

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      @next is a linked list of the lists sorted by their first remaining item. So $p is the (index of the) list to pick from (with the smallest item). $next[$p] is the (index of the) next list, the one with the next smallest first remaining item. $next[$next[$p]] is the list after that. And -1 marks the end of the linked list.

      After processing one item, we move that list's index in the linked list by moving down the linked list until we find the spot where its new first remaining item falls between two other lists' first remaining items.

      The linked list means that we don't need to shift a whole block of array elements to do this "delete and re-insert later in the list" operation.

      Of course, if you used a plain array, then you could use a binary search for finding the next place to insert.

      And, since this is Perl, the fastest solution will likely be the one that involves the fewest opcodes instead of the one with the smallest "big O" complexity. If I were writing this in C, I'd use a heap; but those are rarely the fastest solution in Perl because slow opcode dispatch makes the more complex algorithm too heavy of a speed penalty -- you'd need a huge number of lists for a heap to pay off speed-wise in Perl.

      - tye