in reply to Searching parallel arrays.

BrowserUk,
For reasons I don't want to go into, I don't currently have the ability run or test any code so these are ideas you would have to implement yourself:

Idea 1:
Use a modified merge algorithm to efficiently copy all values into a single sorted array. In doing so, create a parallel hash that knows which array the value came from. Walking the sorted array to find spans of 4 should be straight forward. I believe this is a refinement jbert's idea.

Idea 2:
This is more memory friendlier but will likely be slower. Use a binary search with a modified merge algorithm. You will need to keep track track of your position in each array. This will keep track of a moving end point of your binary search as well as allow you to walk your arrays as a merge algorithm would. Find the lowest value using the current positions, do a binary search over each array looking for the next value. I think you can work out from here the rest of the details.

Minor wording update.

Cheers - L~R

Replies are listed 'Best First'.
Re^2: Searching parallel arrays.
by BrowserUk (Patriarch) on Dec 08, 2006 at 21:50 UTC

    Both these are along the lines of what I was thinking of when I posted, but couldn't wrap my brain around the implementation.

    Per your /msg, I agree that as the individual arrays are already sorted, there is no need for the binary search, and so a merge algorithm with a small buffer can step through the arrays picking out the next highest value and tracking where it's got to in each array. In theory, this can be extended to handle any number of arrays, though this isn't (currently) a requirement, but my code below does do that.

    It struck me that this is probably an algorithm to which tye's Algorithm::Loops::loops code would lend itself, and I was hoping that someone who groks that, (which still defeats me for some reason?), would post some code. I understand your frustration at your current Perless state.

    The problem I had with implementing it revolved around how to detect the end of the main loop. Ie. How to detect when the indexes into the individual arrays had reached the end--across all of the arrays. I eventually came up with a grep in a scalar context solution that works, but I'm still not sure that there isn't a better way? I keep thinking about initialising the array indexes to $#array and stepping them backwards, so that I can use their transition to zero to detect the ends...

    Anyway, this is what I came up with so far. I'm not sure how it compares efficiency-wise with the sort them all together solutions, but it does avoid the need for a large AoA or AoCompositeTags. In theory it's O(n), but as we have both meditated on in the past, O(n) Perl code isn't necessarially quicker than O(n log n) C-via-builtins code.

    The tricky bits are:

    1. how to handle min to return the indexes whilst also handling running off the end of the arrays

      (the reduce line).

    2. the nested indexing in the push line.

      Trying to handle that in C code is giving me nightmares :) I do love Perl.

    Update: Added fixes 1 & 2 found by Not a Number++ below

    #! perl -slw use strict; use Data::Dumper; use List::Util qw[ reduce ]; use constant MAXINT => 2**32; sub seq { my $aoa = shift; my @is = ( 0 ) x @$aoa; my @seqs; my @seq; sub seq { my $aoa = shift; my @is = ( 0 ) x @$aoa; my @seqs; my @seq; while( grep{ $is[ $_ ] < @{ $aoa->[ $_ ] } } 0 .. $#is ) { # print "\n@is"; # print "@$_" for @seq; my $iLow = reduce { ( $aoa->[ $a ][ $is[ $a ] ] || MAXINT ) < ( $aoa->[ $b ][ $is[ $b ] ] || MAXINT ) ? $a : $b } 0 .. $#is; push @seq, [ $aoa->[ $iLow ][ $is[ $iLow ] ], $iLow, $is[ $iLo +w ] ]; ++$is[ $iLow ]; if( @seq > 1 and $seq[ -1 ][ 0 ] > ( $seq[ -2 ][ 0 ] + 1 ) ) { my $temp = pop @seq; ## Fix 1a push @seqs, [ @seq ] if @seq >= 4; @seq = $temp; ## Fix 1b } } push @seqs, \@seq if @seq >= 4; ## Fix 2 return @seqs; } my @data = ( [ 100, 204, 312 ], [ 102, 313, 409 ], [ 205, 206, 315 ], [ 207, 210, 314 ], ); my @seqs = seq( \@data ); print map{ sprintf "Value: %d at array:%d position:%d\n", @$_ } @$_ for @seqs; __END__ c:\test>588553 Value: 204 at array:0 position:1 Value: 205 at array:2 position:0 Value: 206 at array:2 position:1 Value: 207 at array:3 position:0 Value: 312 at array:0 position:2 Value: 313 at array:1 position:1 Value: 314 at array:3 position:2 Value: 315 at array:2 position:2

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      BrowserUk,
      I haven't had a chance to optimize this or even to test it thoroughly but I wanted to share. It is commented but if you need further explanation, please don't hesitate.

      I didn't use List::Util's reduce() because apparently it can handle multiple return values. The docs (Returns the result of the last call to BLOCK.) should probably be clarified on this point

      Cheers - L~R

        Nice++ I like that you've made it an iterator. I'll do some comparisons with mine and one of the sort them together solutions and see what's what as far as efficiency is concerned.

        I didn't use List::Util's reduce() because apparently it can['t] handle multiple return values.

        I guess it's pretty much the nature of the beast that reduce can only handle one return value as it becomes $a for the next iteration.

        However, I have worked around this in the past by using an anonmous array.

        @data = map int( rand 100 ), 1.. 100; $max = ( reduce{ $a->[0] > $b ? $a : ( $a->[0] = $b, $a ) } [0], @data )->[0]; print $max;; 96

        Of course that's a useless use of the technique, but it demonstrates it and you can put anything else inside the anon array that you need to carry around.

        reduce is biggest discovery from playing with functional languages. I find it, and it's named variants just so useful. I really think that it deserves inclusion directly into the langauge--which would be possible for 5.10 now they've added support for adding new stuff into core.

        I'm also hoping for great things as far as the efficiency of function calls goes in 5.10, which I read somewhere was going to be improved--but I cannot now find where. It might make functional composition ala tmortel's meditations, and HOP efficient enough to be usable for something other than interestign demonstrations. Did you here anything about this?

        I'll probably communicate the finding of whatever benchmarking I do offline if your interested, unless someone else reading this speaks up and asks me to post it.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      Hmmm, here are some test cases where your code appears to fail.

      [ 7, 100, 204, 312 ], [ 8, 102, 313, 409 ], [ 9, 205, 206, 315 ], [ 10, 207, 210, 314 ], [ 100, 204, 312 ], [ 102, 313, 409 ], [ 205, 206, 315 ], [ 207, 210, 314, 401, 402, 403, 404 ], [ 1, 6, 11 ], [ 2, 7, 12 ], [ 3, 8, 13 ], [ 4, 9, 14 ],

      I'm off out now, so I've got no time to look for a pattern (or to attempt to understand your code :-)...

      Quick update: Limbic~Region's code, above, gets these right.

        Thanks for this. You've found two problems, one of which I had already discovered and solved, the other I had not.

        The first was that this code

        if( @seq > 1 and $seq[ -1 ][ 0 ] > ( $seq[ -2 ][ 0 ] + 1 ) ) { pop @seq; if( @seq >= 4 ) { push @seqs, [ @seq ]; } undef @seq;

        discarded the non sequential value. If that was the start of a new sequence, the new sequence was missed or truncated. The fix is:

        if( @seq > 1 and $seq[ -1 ][ 0 ] > ( $seq[ -2 ][ 0 ] + 1 ) ) { my $temp = pop @seq; if( @seq >= 4 ) { push @seqs, [ @seq ]; } @seq = $temp;

        The second is that if the last value of the set is also the last value of a sequence, and that sequence is also the minimal length (4 in this case), that sequence is not saved. I'm not sure how to fix this yet.

        Update: Not as clean as I would hope to achieve, but a final check for a complient sequence outside the while loop fixes the second problem. I'll update the code above to reflect the fix.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.