BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Given four arrays, each containing a sorted (ascending) sequence of non-contiguous, unique integers (unique within each array and across all arrays), search those arrays to find sequence(s) (one or more) of 4 contiguous values.

Example:

my @a1 = ( 100, 204, 312 ); my @a2 = ( 102, 313, 409 ); my @a3 = ( 205, 206, 315 ); my @a4 = ( 207, 210, 314 ); ## Two contiguous sequences of 4 values: 204 .. 207 & 312 .. 315

Restructure as AoA or HoA or any other way that retains set/position information.

Easy if you can just sort them all into a single array, but I need to retain knowledge of which set each value in each contiguous sequence comes from, and it's position?

The four sets could be quite large, but not so large that memory for 2 or 3 copies would be a problem.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: Searching parallel arrays.
by jbert (Priest) on Dec 08, 2006 at 11:22 UTC
    Why not do just what you said?

    Sort them into a single array and retain knowledge of where they came from.

    i.e. use a hash (or even an array) to map number -> [ $ref_to_source_array, $index_in_source_array ].

    Then you can look for a sequence in the sorted array and, for each number in the sequence, look up where it came from.

    No code, but it's straightforward I think.

    Update: If the memory consumption is excessive, you could encode an identifier for the source array and the index in a single scalar, e.g. "a1:1", "a3:4".

    You should then be using about 4 times your original memory, if that's too much you could destroy your original arrays and recreate them afterwards from the lookup info :-)

      And some (mediocre :-) code:
      #!/usr/bin/perl -w use strict; my @source_data = ( [ 100, 204, 312 ], [ 102, 313, 409 ], [ 205, 206, 315 ], [ 207, 210, 314 ], ); my @sorted_numbers = sort map { @{$_} } @source_data; # Using hash for the lookup. If your N numbers are fairly dense over # 0..N then use an array for reduced mem consumption my %from_whence; foreach my $source_array (@source_data) { my $index = 0; foreach my $number (@$source_array) { $from_whence{$number} = [$source_array, $index]; ++$index; } }; # Look for 4 contig values my @recent; foreach my $number (@sorted_numbers) { push @recent, [ $number, $from_whence{$number} ]; next unless scalar @recent > 4; shift @recent; check_and_handle_contig(\@recent); } exit 0; sub check_and_handle_contig { my $numbers_info = shift; my $lastnum = undef; foreach my $numinfo (@$numbers_info) { my $num = $numinfo->[0]; if ($lastnum) { return unless $num == $lastnum + 1; } $lastnum = $num; } print "Found sequence:\n"; foreach my $numinfo (@$numbers_info) { my $num = $numinfo->[0]; my $info = $numinfo->[1]; print "$num from $info->[0]:$info->[1]\n"; } }
      which prints:
      Found sequence: 204 from ARRAY(0x814bc28):1 205 from ARRAY(0x8188e94):0 206 from ARRAY(0x8188e94):1 207 from ARRAY(0x8188ed0):0 Found sequence: 312 from ARRAY(0x814bc28):2 313 from ARRAY(0x8188c30):1 314 from ARRAY(0x8188ed0):2 315 from ARRAY(0x8188e94):2
      Update: Limbic~Region++ noted that the sort in the above (more mediocre than I thought :-) code should be numeric, rather than the default sort. Update 2:And there's another bug, found by Not_a_Number, fixes below.

        Not sure why, but I get no output from your code when I use these test data:

        my @source_data = ( [ 1, 9, 17 ], [ 2, 10, 18 ], [ 3, 11, 19 ], [ 4, 12, 20 ], );
Re: Searching parallel arrays.
by grinder (Bishop) on Dec 08, 2006 at 14:17 UTC

    Here's make take on the problem. I haven't looked at the other solutions, so it will be interesting to see how the others compare.

    Since the numbers are contiguous unique across all arrays, a simple hash will do the trick, whose key is the orginal number in the source array, and the value is a reference to a list that stores an identifier that stands in place of the array, and the offset within the array.

    I then take the sequence numbers from an array, and look for sequences of four. When I find one, I see how far it can be extended whilst keeping the sequence contiguous.

    #! /usr/local/bin/perl -w use strict; use constant CONTIG => 4; my @a1 = ( 100, 204, 312 ); my @a2 = ( 102, 313, 409 ); my @a3 = ( 205, 206, 315 ); my @a4 = ( 207, 210, 314 ); my %seen; my $off = 0; $seen{$_} = [1, $off++] for @a1; $off = 0; $seen{$_} = [2, $off++] for @a2; $off = 0; $seen{$_} = [3, $off++] for @a3; $off = 0; $seen{$_} = [4, $off++] for @a4; my @seq = sort {$a <=> $b} keys %seen; my $idx = 0; while ($idx <= $#seq - (CONTIG-1) ) { if ($seq[$idx] + CONTIG-1 < $seq[$idx + CONTIG-1]) { ++$idx; } else { # at least 4 contiguous (CONTIG) sequence numbers # see how far we can extend the range my $range = CONTIG-1; while ($idx+$range+1 <= $#seq and $seq[$idx]+$range+1 == $seq[$idx+$range+1] ) { ++$range; } print "saw $seq[$idx]..$seq[$idx+$range]\n\t", join( "\n\t", map {pretty($seen{$_})} @seq[$idx .. $idx+$range] ), "\n"; $idx += $range; } } sub pretty { my $r = shift; return "array $r->[0], offset $r->[1]"; }

    (update: erm, and here's the output:)

    saw 204..207 array 1, offset 1 array 3, offset 0 array 3, offset 1 array 4, offset 0 saw 312..315 array 1, offset 2 array 2, offset 1 array 4, offset 2 array 3, offset 2

    • another intruder with the mooring in the heart of the Perl

      Very neat++. Took me a while to grok the indexing, but it's really cool.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Searching parallel arrays.
by Melly (Chaplain) on Dec 08, 2006 at 11:34 UTC

    Hmm... I'd be tempted to copy them to a single array, but as anon. arrays of value + source - e.g.:

    foreach(@a1){ push @all, ['a1', $_]; } foreach(@a2){ push @all, ['a2', $_]; } etc...

    Then you can sort on @all->[1] and still have access to the original array name (oh, and thinking about it, you could just create references to the original array elements, rather than copying the values).

    <update>mediocre minds think alike...</update>

    map{$a=1-$_/10;map{$d=$a;$e=$b=$_/20-2;map{($d,$e)=(2*$d*$e+$a,$e**2 -$d**2+$b);$c=$d**2+$e**2>4?$d=8:_}1..50;print$c}0..59;print$/}0..20
    Tom Melly, pm@tomandlu.co.uk
Re: Searching parallel arrays.
by Corion (Patriarch) on Dec 08, 2006 at 14:09 UTC

    I have the idea of implementing something like a modified Boyer-Moore search on your data structure, but your description and data don't match:

    The rule is each array contains a sorted (ascending) sequence of non-contiguous, unique integers, but my @a3 = ( 205, 206, 315 ); contains the two contiguous numbers 205 and 206.

    If the data is wrong and the description is right, my algorithm would be to

    1. Take a look at the highest number of the four numbers at the start of the array. This highest number must be present in the sequence, as every array can only contribute one number/entry to the sequence, per sequence. This assumption/consequent is wrong, see Not a Number's reply below.
    2. Then, move forward in the other three arrays up to that number-3, by using a binary search.
    3. Then, look at the number found by the binary search, or if not found the next greater entry. By definition, there are no duplicates, so the elements in the arrays are all different and all equal or greater than $number-3.
    4. Look at the four numbers now. If they make up a sequence all is well. Otherwise, remove all the previous smaller elements from the arrays and start over at step 1.

    I think this algorithm optimizes discarding non-matches over finding matches.

      I'm not sure I fully understand, but wouldn't that approach fail for a configuration like this:

      my @a1 = ( 1, 3, 5 ); my @a2 = ( 2, 4, 6 ); my @a3 = ( 205, 207, 315 ); my @a4 = ( 206, 208, 314 );

        Indeed - my approach will fail for such configurations, thanks for pointing that out!

      Corion++ You're right. My description was lacking in clarity. I was trying to indicate that any sequence of four values found would definitely be spread across 2 or more of the arrays.

      In this case, the example is king.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Searching parallel arrays.
by Limbic~Region (Chancellor) on Dec 08, 2006 at 13:47 UTC
    BrowserUk,
    For reasons I don't want to go into, I don't currently have the ability run or test any code so these are ideas you would have to implement yourself:

    Idea 1:
    Use a modified merge algorithm to efficiently copy all values into a single sorted array. In doing so, create a parallel hash that knows which array the value came from. Walking the sorted array to find spans of 4 should be straight forward. I believe this is a refinement jbert's idea.

    Idea 2:
    This is more memory friendlier but will likely be slower. Use a binary search with a modified merge algorithm. You will need to keep track track of your position in each array. This will keep track of a moving end point of your binary search as well as allow you to walk your arrays as a merge algorithm would. Find the lowest value using the current positions, do a binary search over each array looking for the next value. I think you can work out from here the rest of the details.

    Minor wording update.

    Cheers - L~R

      Both these are along the lines of what I was thinking of when I posted, but couldn't wrap my brain around the implementation.

      Per your /msg, I agree that as the individual arrays are already sorted, there is no need for the binary search, and so a merge algorithm with a small buffer can step through the arrays picking out the next highest value and tracking where it's got to in each array. In theory, this can be extended to handle any number of arrays, though this isn't (currently) a requirement, but my code below does do that.

      It struck me that this is probably an algorithm to which tye's Algorithm::Loops::loops code would lend itself, and I was hoping that someone who groks that, (which still defeats me for some reason?), would post some code. I understand your frustration at your current Perless state.

      The problem I had with implementing it revolved around how to detect the end of the main loop. Ie. How to detect when the indexes into the individual arrays had reached the end--across all of the arrays. I eventually came up with a grep in a scalar context solution that works, but I'm still not sure that there isn't a better way? I keep thinking about initialising the array indexes to $#array and stepping them backwards, so that I can use their transition to zero to detect the ends...

      Anyway, this is what I came up with so far. I'm not sure how it compares efficiency-wise with the sort them all together solutions, but it does avoid the need for a large AoA or AoCompositeTags. In theory it's O(n), but as we have both meditated on in the past, O(n) Perl code isn't necessarially quicker than O(n log n) C-via-builtins code.

      The tricky bits are:

      1. how to handle min to return the indexes whilst also handling running off the end of the arrays

        (the reduce line).

      2. the nested indexing in the push line.

        Trying to handle that in C code is giving me nightmares :) I do love Perl.

      Update: Added fixes 1 & 2 found by Not a Number++ below

      #! perl -slw use strict; use Data::Dumper; use List::Util qw[ reduce ]; use constant MAXINT => 2**32; sub seq { my $aoa = shift; my @is = ( 0 ) x @$aoa; my @seqs; my @seq; sub seq { my $aoa = shift; my @is = ( 0 ) x @$aoa; my @seqs; my @seq; while( grep{ $is[ $_ ] < @{ $aoa->[ $_ ] } } 0 .. $#is ) { # print "\n@is"; # print "@$_" for @seq; my $iLow = reduce { ( $aoa->[ $a ][ $is[ $a ] ] || MAXINT ) < ( $aoa->[ $b ][ $is[ $b ] ] || MAXINT ) ? $a : $b } 0 .. $#is; push @seq, [ $aoa->[ $iLow ][ $is[ $iLow ] ], $iLow, $is[ $iLo +w ] ]; ++$is[ $iLow ]; if( @seq > 1 and $seq[ -1 ][ 0 ] > ( $seq[ -2 ][ 0 ] + 1 ) ) { my $temp = pop @seq; ## Fix 1a push @seqs, [ @seq ] if @seq >= 4; @seq = $temp; ## Fix 1b } } push @seqs, \@seq if @seq >= 4; ## Fix 2 return @seqs; } my @data = ( [ 100, 204, 312 ], [ 102, 313, 409 ], [ 205, 206, 315 ], [ 207, 210, 314 ], ); my @seqs = seq( \@data ); print map{ sprintf "Value: %d at array:%d position:%d\n", @$_ } @$_ for @seqs; __END__ c:\test>588553 Value: 204 at array:0 position:1 Value: 205 at array:2 position:0 Value: 206 at array:2 position:1 Value: 207 at array:3 position:0 Value: 312 at array:0 position:2 Value: 313 at array:1 position:1 Value: 314 at array:3 position:2 Value: 315 at array:2 position:2

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        BrowserUk,
        I haven't had a chance to optimize this or even to test it thoroughly but I wanted to share. It is commented but if you need further explanation, please don't hesitate.

        I didn't use List::Util's reduce() because apparently it can handle multiple return values. The docs (Returns the result of the last call to BLOCK.) should probably be clarified on this point

        Cheers - L~R

        Hmmm, here are some test cases where your code appears to fail.

        [ 7, 100, 204, 312 ], [ 8, 102, 313, 409 ], [ 9, 205, 206, 315 ], [ 10, 207, 210, 314 ], [ 100, 204, 312 ], [ 102, 313, 409 ], [ 205, 206, 315 ], [ 207, 210, 314, 401, 402, 403, 404 ], [ 1, 6, 11 ], [ 2, 7, 12 ], [ 3, 8, 13 ], [ 4, 9, 14 ],

        I'm off out now, so I've got no time to look for a pattern (or to attempt to understand your code :-)...

        Quick update: Limbic~Region's code, above, gets these right.

Re: Searching parallel arrays.
by johngg (Canon) on Dec 08, 2006 at 14:10 UTC
    This seems to do what you want without too much thrashing around. It should cope with overlapping sets (5 or more consecutive numbers) although I haven't tested that.

    use strict; use warnings; use Data::Dumper; my @a1 = ( 100, 204, 312 ); my @a2 = ( 102, 313, 409 ); my @a3 = ( 205, 206, 315 ); my @a4 = ( 207, 210, 314 ); ## Two contiguous sequences of 4 values: 204 .. 207 & 312 .. 315 my $arrayNo = -1; my @ints = sort {$a->[0] <=> $b->[0]} map { my $raOrig = $_; my @items = (); push @items, [$raOrig->[0]->[$_], qq{$raOrig->[1]:$_}] for 0 .. $#{$raOrig->[0]}; @items; } map { $arrayNo ++; [$_, $arrayNo] } \@a1, \@a2, \@a3, \@a4; my @sets; for my $window (0 .. $#ints - 3) { push @sets, [@ints[$window .. $window + 3]] if $ints[$window]->[0] == $ints[$window + 3]->[0] - 3; } print Dumper \@sets;

    The Dumper output is

    $VAR1 = [ [ [ 204, '0:1' ], [ 205, '2:0' ], [ 206, '2:1' ], [ 207, '3:0' ] ], [ [ 312, '0:2' ], [ 313, '1:1' ], [ 314, '3:2' ], [ 315, '2:2' ] ] ];

    I hope this is of use.

    Cheers,

    JohnGG

    Update: Removed superfluous variable $elemNo left over from development

Re: Searching parallel arrays.
by pajout (Curate) on Dec 08, 2006 at 15:13 UTC
    When integers 1,2,3,4,5 will be spread accross the arrays, do you expect one (which?) or two contiguous sequences?

      Good question++. Longer sequences are not expected but could occur.

      They can be reported as a single long sequence, or as multiple sequences of 4, whichever is easier for the algorithm.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Thanks for that clarification, but you still haven't answered Corion's equally good objection++ above, namely:

        The rule is each array contains a sorted (ascending) sequence of non-contiguous, unique integers, but my @a3 = ( 205, 206, 315 ); contains the two contiguous numbers 205 and 206.

        Should we be reasoning based on your descripition of the problem, or on your supplied dataset?

        I'v tried to code it, without reading codes posted previously, just for enjoy. I'v been curious, how many rows it will has. O.K., it is not very perlish... Idea was very simple - zip the four arrays to one stream, fill circle buffer and test if last value of the buffer - 3 is equal to first value. Damn, 67 rows.
        use strict; use warnings; my @a1 = ( 100, 204, 208, 312 ); my @a2 = ( 102, 313, 409 ); my @a3 = ( 205, 206, 315 ); my @a4 = ( 207, 210, 314 ); my $a = [\@a1,\@a2,\@a3,\@a4]; my @ai = (0, 0, 0, 0); my ($a1a2, $a3a4) = (getLess(0), getLess(2)); my @circle = (getNext(),getNext(),getNext()); my $ci = 3; while (defined($circle[$ci] = getNext())) { if ($circle[$ci][2]-3 == $circle[($ci+1)%4][2]) { for (my $i = 1; $i < 5; $i++) { print $circle[($ci+$i)%4][2]. ' ['.$circle[($ci+$i)%4][0].','.$circle[($ci+$i)%4][1].'] '; } print "\n"; } $ci = ($ci+1)%4; } sub getLess { my $offset = shift; my $ret; if (defined $$a[$offset][$ai[$offset]]) { if (defined $$a[$offset+1][$ai[$offset+1]]) { $offset++ if $$a[$offset][$ai[$offset]] > $$a[$offset+1][$ai[$ +offset+1]]; } $ret = [$offset, $ai[$offset], $$a[$offset][$ai[$offset]]]; $ai[$offset]++; } else { $offset++; if (defined $$a[$offset][$ai[$offset]]) { $ret = [$offset, $ai[$offset], $$a[$offset][$ai[$offset]]]; $ai[$offset]++; } } return $ret; } sub getNext { my $ret; if (defined $a1a2) { if (defined $a3a4) { if ($$a1a2[2] < $$a3a4[2]) { $ret = $a1a2; $a1a2 = getLess(0); } else { $ret = $a3a4; $a3a4 = getLess(2); } } else { $ret = $a1a2; $a1a2 = getLess(0); } } else { if (defined $a3a4) { $ret = $a3a4; $a3a4 = getLess(2); } } return $ret; }
        updated: output is
        204 [0,1] 205 [2,0] 206 [2,1] 207 [3,0] 205 [2,0] 206 [2,1] 207 [3,0] 208 [0,2] 312 [0,3] 313 [1,1] 314 [3,2] 315 [2,2]
Re: Searching parallel arrays. (just merge)
by tye (Sage) on Dec 09, 2006 at 06:37 UTC

    I'd just do a standard merge but throw away the results as you go unless the next output is 1+previous.

    #!/usr/bin/perl -w use strict; my @list= ( # A more interesting test: [ 101, 105, 204, 312 ], [ 100, 313, 409 ], [ 205, 206, 211, 315 ], [ 106, 207, 210, 314 ], ); # Temporarily get list numbers sorted by first item my @next= sort { $list[$a][0] <=> $list[$b][0] } 0..$#list; # $p is the index of the list to pick from next # $next[$p] says which list to pick from after list $p ( my $p, @next[@next] )= ( @next, -1 ); my @run; # Our consecutive run so far my @from; # Which list we selected each of the above from my @best= ( undef ) x 2; # The run to beat while( 1 ) { # Add smallest remaining item to our run push @run, shift @{ $list[$p] }; push @from, $p; # If we've run out of items or if this one isn't consecutive if( ! defined( $run[-1] ) || 1 < @run && $run[-2]+1 < $run[-1] ) { # then see if the now-finished run is a winner pop @from; if( @best <= @from ) { # We've at least tied, so report if( @best < @from ) { # A new size reached! print "$#run-int runs:$/"; } @best= @run; pop @best; print "$best[0] .. $best[-1] ( @from )$/"; } last if ! defined $run[-1]; # We ran out of items # The next run starts with the item we just added @run= $run[-1]; @from= $p; } elsif( 1 < @run && $run[-1] <= $run[-2] ) { # The lists weren't sorted or our code is broken die "Abort; out of sequence: @run$/"; } # Move $list[$p] now that it has a bigger first item my $n= $next[$p]; if( ! @{ $list[$p] } ) { # Empty. Don't insert $list[$p]. $next[$p]= -2; # Not really needed $p= $n; } elsif( $n < 0 || $list[$p][0] < $list[$n][0] ) { # Do $list[$p] again } else { my $x= $n; my $y= $next[$x]; while( 0 <= $y && $list[$y][0] < $list[$p][0] ) { $y= $next[ $x= $y ]; } # Do list[$p] after $list[$x], before $list[$y]: $next[$x]= $p; $next[$p]= $y; $p= $n; } }

    - tye        

      I don't quite follow how you track which array to read from next (yet), but godamn it's fast.

      The following times yours, Limbic~Region's and mine. The data is 1 .. $N (minus anything with a 9 to stop it being a single contiguous sequence), randomly distributed across 4 arrays. The only tweaks I've made are to accumulate the results into an array rather than printing them out.

      c:\test>588553-buk -N=100000 1 trial of Four arrays and a total of 100000 (1.756s total) Found 6561 runs c:\test>588553-lr -N=100000 1 trial of Four arrays and a total of 100000 (1.164s total) Found 6561 runs c:\test>588553-tye -N=100000 1 trial of Four arrays and a total of 100000 (507.209ms total) Found 6561 runs

      By way of comparison, I also tested one of the 'sort them all together' solutions (grinder's, with apologies to the other contributors!) against yours, though I had to drop $N to stop it from blowing the memory:

      c:\test>588553-gdr -N=10000 1 trial of Four arrays and a total of 10000 (4.491s total) Found 729 runs c:\test>588553-tye -N=10000 1 trial of Four arrays and a total of 10000 (55.570ms total) Found 729 runs

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        @next is a linked list of the lists sorted by their first remaining item. So $p is the (index of the) list to pick from (with the smallest item). $next[$p] is the (index of the) next list, the one with the next smallest first remaining item. $next[$next[$p]] is the list after that. And -1 marks the end of the linked list.

        After processing one item, we move that list's index in the linked list by moving down the linked list until we find the spot where its new first remaining item falls between two other lists' first remaining items.

        The linked list means that we don't need to shift a whole block of array elements to do this "delete and re-insert later in the list" operation.

        Of course, if you used a plain array, then you could use a binary search for finding the next place to insert.

        And, since this is Perl, the fastest solution will likely be the one that involves the fewest opcodes instead of the one with the smallest "big O" complexity. If I were writing this in C, I'd use a heap; but those are rarely the fastest solution in Perl because slow opcode dispatch makes the more complex algorithm too heavy of a speed penalty -- you'd need a huge number of lists for a heap to pay off speed-wise in Perl.

        - tye        

Re: Searching parallel arrays.
by NiJo (Friar) on Dec 08, 2006 at 20:35 UTC
    A variation of the 'one array' theme might be effective:

    The input phase reads the arrays to switch on the N-th bit of a vector. After a run length encoding you are looking for every 2nd value (that represents a block of 1's) >3 .

    Depending on the expected hit density it pays to loose the relation to the input array and 'grep' it again.

      That's a really interesting idea.

      In this case I'm not sure its workable as, unlike my example, the integers in question can potentially take the full (unsigned) range of values--they are in fact addresses--so the bit vectors could get pretty large.

      Even if I scale the values (the addresses will always be at least 4-byte aligned), the range is still potentially 2**29 on a 32-bit cpu, which would make for some pretty sparse bitvectors. RLE might help, but then you loose the advantage of being able to use bitstring logical operations on them.

      Later on I might be able to fursther reduce the range by excluding a large chunks at the bottom (and maybe top) of memory, and I might be able to rely on 12 or even 16 byte alignment. That would hugely reduce the size of the bitvectors and coud bring this into the realms of feasability. Thanks.

      If anyone knows of how to obtain the lower and upper bounds of the dynamic memory of the current Perl process, I'd love to hear from them.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I understand that scaling issue. In this situation it makes sense to use a Bloom::Filter, basically a compressed bit vector. The algorithm scales well but produces false positives:

        http://www.perl.com/pub/a/2004/04/08/bloom_filters.html

        You need to loop through the arrays once to create the vector and a second time for the test [N-2..N+2].

Re: Searching parallel arrays.
by Anonymous Monk on Dec 11, 2006 at 10:35 UTC
    Hi, please find the attached code and its containing the samples outpu +ts. please let me know your comments. use strict; use warnings; ## variable declaration my ( $i, $j, $k); my ( @MergeArray,@NumArray, @SortArray, %ResultHash ); ## assigning values to variables $i = $j = $k = 0; @MergeArray = @NumArray = @SortArray = %ResultHash =(); ## given the Inputs my @a1 = ( 100, 204, 312 ); my @a2 = ( 102, 313, 409 ); my @a3 = ( 205, 206, 315 ); my @a4 = ( 207, 210, 314 ); ########################## PROGRAM START ######################## ## merging all arrays with names @MergeArray = ( 'a1', @a1,'a2', @a2,'a3', @a3,'a4', @a4 ); ## grepping numbers only @NumArray = grep!/^a/, @MergeArray; ## sorting numbers @SortArray = sort @NumArray; ## reading sorted numbers for( $i = 0; $i < @SortArray - 1; $i++ ){ my $no1 = $SortArray[$i]; ## one number compare with all other numbes for( $j = $i+1; $j < @SortArray; $j++){ my $no2 = $SortArray[$j]; my $diff = $no2 - $no1; ## getting contiguous differnces function ContiguousFun( $diff, $no1, $no2 ); } } ## printing results print "Contiguous sequences and the numbers appropriate array name and + position of the index\n"; foreach my $res (sort keys %ResultHash ){ ## you have to change the condition whatever you need next if($res < 2 or $res > 9); print "\nnumber $res\n"; print $ResultHash{$res}; } ########################## PROGRAM END ######################## ## sub function for 'Contiguous' sub ContiguousFun{ my $diff_no = shift; my $arrno1 = shift; my $arrno2 = shift; my $arrno1_info = GettingNameAndIndexVal( $arrno1 ); my $arrno2_info = GettingNameAndIndexVal( $arrno2 ); ## checking if diff_no exists or not if(exists $ResultHash{$diff_no}){ $ResultHash{$diff_no} = "$ResultHash{$diff_no}"."\n$arrno1 && +$arrno2 ==> $arrno1_info AND $arrno2_info"; }else{ $ResultHash{$diff_no} = "$arrno1 & $arrno2 ==> $arrno1_info AN +D $arrno2_info"; } } ## getting array names and index value function sub GettingNameAndIndexVal{ my $arrno = shift; my $arrayname =" "; foreach my $item ( @MergeArray ){ ## getting array name $arrayname = $item if( $item =~ /a/i ); ## initialized to index value if array name found $k = -1 if( $item =~ /a/i); ## processing numbers only not a array names ie, ingoring arra +y names next if( $item =~ /a/i); $k++; ## returning array name with index value return "$arrayname, $k" if( $item == $arrno ); } } __END__ sample inputs my @a1 = ( 100, 204, 312 ); my @a2 = ( 102, 313, 409 ); my @a3 = ( 205, 206, 315 ); my @a4 = ( 207, 210, 314 ); Sample Results Contiguous sequences and the numbers appropriate array name and positi +on of the index number 2 100 & 102 ==> a1, 0 AND a2, 0 204 && 206 ==> a1, 1 AND a3, 1 205 && 207 ==> a3, 0 AND a4, 0 312 && 314 ==> a1, 2 AND a4, 2 313 && 315 ==> a2, 1 AND a3, 2 number 3 204 & 207 ==> a1, 1 AND a4, 0 207 && 210 ==> a4, 0 AND a4, 1 312 && 315 ==> a1, 2 AND a3, 2 number 4 206 & 210 ==> a3, 1 AND a4, 1 number 5 205 & 210 ==> a3, 0 AND a4, 1 number 6 204 & 210 ==> a1, 1 AND a4, 1