Re^2: Searching parallel arrays.

Both these are along the lines of what I was thinking of when I posted, but couldn't wrap my brain around the implementation.

Per your /msg, I agree that as the individual arrays are already sorted, there is no need for the binary search, and so a merge algorithm with a small buffer can step through the arrays picking out the next highest value and tracking where it's got to in each array. In theory, this can be extended to handle any number of arrays, though this isn't (currently) a requirement, but my code below does do that.

It struck me that this is probably an algorithm to which tye's Algorithm::Loops::loops code would lend itself, and I was hoping that someone who groks that, (which still defeats me for some reason?), would post some code. I understand your frustration at your current Perless state.

The problem I had with implementing it revolved around how to detect the end of the main loop. Ie. How to detect when the indexes into the individual arrays had reached the end--across all of the arrays. I eventually came up with a grep in a scalar context solution that works, but I'm still not sure that there isn't a better way? I keep thinking about initialising the array indexes to $#array and stepping them backwards, so that I can use their transition to zero to detect the ends...

Anyway, this is what I came up with so far. I'm not sure how it compares efficiency-wise with the sort them all together solutions, but it does avoid the need for a large AoA or AoCompositeTags. In theory it's O(n), but as we have both meditated on in the past, O(n) Perl code isn't necessarially quicker than O(n log n) C-via-builtins code.

The tricky bits are:

how to handle min to return the indexes whilst also handling running off the end of the arrays
(the reduce line).
the nested indexing in the push line.
Trying to handle that in C code is giving me nightmares :) I do love Perl.

Update: Added fixes 1 & 2 found by Not a Number++ below

#! perl -slw
use strict;
use Data::Dumper;
use List::Util qw[ reduce ];

use constant MAXINT => 2**32;

sub seq {
    my $aoa = shift;
    my @is = ( 0 ) x @$aoa;
    my @seqs;
    my @seq;

sub seq {
    my $aoa = shift;
    my @is = ( 0 ) x @$aoa;
    my @seqs;
    my @seq;

    while( grep{ $is[ $_ ] < @{ $aoa->[ $_ ] } } 0 .. $#is ) {
#        print "\n@is";
#        print "@$_" for @seq;
        my $iLow  = reduce {
            ( $aoa->[ $a ][ $is[ $a ] ] || MAXINT )
            <
            ( $aoa->[ $b ][ $is[ $b ] ] || MAXINT )
            ? $a : $b
        } 0 .. $#is;

        push @seq, [ $aoa->[ $iLow ][ $is[ $iLow ] ], $iLow, $is[ $iLo
+w ] ];
        ++$is[ $iLow ];

        if( @seq > 1 and $seq[ -1 ][ 0 ] > ( $seq[ -2 ][ 0 ] + 1 ) ) {
            my $temp = pop @seq;                       ## Fix 1a
            push @seqs, [ @seq ] if @seq >= 4;
            @seq = $temp;                              ## Fix 1b
        }
    }
    push @seqs, \@seq if @seq >= 4;                    ## Fix 2
    return @seqs;
}

my @data = (
    [ 100, 204, 312 ],
    [ 102, 313, 409 ],
    [ 205, 206, 315 ],
    [ 207, 210, 314 ],
);

my @seqs = seq( \@data );

print map{
    sprintf "Value: %d at array:%d position:%d\n", @$_
} @$_ for @seqs;

__END__
c:\test>588553
Value: 204 at array:0 position:1
Value: 205 at array:2 position:0
Value: 206 at array:2 position:1
Value: 207 at array:3 position:0

Value: 312 at array:0 position:2
Value: 313 at array:1 position:1
Value: 314 at array:3 position:2
Value: 315 at array:2 position:2
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^2: Searching parallel arrays. Select or Download Code

Replies are listed 'Best First'.
Re^3: Searching parallel arrays. by Limbic~Region (Chancellor) on Dec 09, 2006 at 00:32 UTC
BrowserUk, I haven't had a chance to optimize this or even to test it thoroughly but I wanted to share. It is commented but if you need further explanation, please don't hesitate. Read more... (3 kB) I didn't use List::Util's reduce() because apparently it can handle multiple return values. The docs (Returns the result of the last call to BLOCK.) should probably be clarified on this point Cheers - L~R	[reply] [d/l]
Re^4: Searching parallel arrays. by BrowserUk (Patriarch) on Dec 09, 2006 at 02:12 UTC
Nice++ I like that you've made it an iterator. I'll do some comparisons with mine and one of the sort them together solutions and see what's what as far as efficiency is concerned. I didn't use List::Util's reduce() because apparently it can['t] handle multiple return values. I guess it's pretty much the nature of the beast that `reduce` can only handle one return value as it becomes `$a` for the next iteration. However, I have worked around this in the past by using an anonmous array. `@data = map int( rand 100 ), 1.. 100; $max = ( reduce{ $a->[0] > $b ? $a : ( $a->[0] = $b, $a ) } [0], @data )->[0]; print $max;; 96` [download] Of course that's a useless use of the technique, but it demonstrates it and you can put anything else inside the anon array that you need to carry around. `reduce` is biggest discovery from playing with functional languages. I find it, and it's named variants just so useful. I really think that it deserves inclusion directly into the langauge--which would be possible for 5.10 now they've added support for adding new stuff into core. I'm also hoping for great things as far as the efficiency of function calls goes in 5.10, which I read somewhere was going to be improved--but I cannot now find where. It might make functional composition ala tmortel's meditations, and HOP efficient enough to be usable for something other than interestign demonstrations. Did you here anything about this? I'll probably communicate the finding of whatever benchmarking I do offline if your interested, unless someone else reading this speaks up and asks me to post it. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^5: Searching parallel arrays. by Limbic~Region (Chancellor) on Dec 09, 2006 at 02:42 UTC
BrowserUk, I had originally always intended it to be an iterator but it would have made describing the algorithm more difficult. I made no effort at optimization so it can probably be improved greatly. I have submitted a bug report on List::Util indicating that the documentation should be explicit about the single return value or the implementation should be changed. I suspect you are right and the fix is a doc update. I have read through the perldeltas for 5.9.0 - 5.9.4 and did not see anything regarding sub call speedups. I am interested in Benchmark results using real data. One optimization could be made by only handling 4 arrays and using a hardcode @cont of 4 instead of both being user defined. You mentioned earlier not understanding tye's Algorithm::Loops. I won't pretend to understand his code either but I do understand the methodology. If you haven't seen Arbitrarily Nested Loops and NestedLoops and the Odometer Model, they may help explain it. Cheers - L~R	[reply]
Re^3: Searching parallel arrays. by Not_a_Number (Prior) on Dec 09, 2006 at 19:29 UTC
Hmmm, here are some test cases where your code appears to fail. `[ 7, 100, 204, 312 ], [ 8, 102, 313, 409 ], [ 9, 205, 206, 315 ], [ 10, 207, 210, 314 ], [ 100, 204, 312 ], [ 102, 313, 409 ], [ 205, 206, 315 ], [ 207, 210, 314, 401, 402, 403, 404 ], [ 1, 6, 11 ], [ 2, 7, 12 ], [ 3, 8, 13 ], [ 4, 9, 14 ],` [download] I'm off out now, so I've got no time to look for a pattern (or to attempt to understand your code :-)... Quick update: Limbic~Region's code, above, gets these right.	[reply] [d/l]
Re^4: Searching parallel arrays. by BrowserUk (Patriarch) on Dec 09, 2006 at 22:20 UTC
Thanks for this. You've found two problems, one of which I had already discovered and solved, the other I had not. The first was that this code `if( @seq > 1 and $seq[ -1 ][ 0 ] > ( $seq[ -2 ][ 0 ] + 1 ) ) { pop @seq; if( @seq >= 4 ) { push @seqs, [ @seq ]; } undef @seq;` [download] discarded the non sequential value. If that was the start of a new sequence, the new sequence was missed or truncated. The fix is: `if( @seq > 1 and $seq[ -1 ][ 0 ] > ( $seq[ -2 ][ 0 ] + 1 ) ) { my $temp = pop @seq; if( @seq >= 4 ) { push @seqs, [ @seq ]; } @seq = $temp;` [download] The second is that if the last value of the set is also the last value of a sequence, and that sequence is also the minimal length (4 in this case), that sequence is not saved. ~~I'm not sure how to fix this yet.~~ Update: Not as clean as I would hope to achieve, but a final check for a complient sequence outside the while loop fixes the second problem. I'll update the code above to reflect the fix. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]