in reply to performance issues with grepping large sorted arrays
If array A is sorted by using the string cmp operator you can employ the binary search algorithm to determine if $item is in the array.
The basic idea is to see if the item is greater than or less than the middle item in the array, if its greater, binary search the top half of the array. If its less, binary search the bottom half. You'll eventually narrow it down to a one element array or hit it along the way at a midpoint. The runtime of this algorithm is O(lg n), versus the O(n) of using grep
A recursive implementation follows:
# Takes an array ref and a target item # Returns the index of the item in the array or # -1 if its not there. sub binsearch { my $arref = shift; my $target = shift; my $len = scalar @$arref; my $midpoint = floor( $len / 2); # Get the center, rounding down whe +n we have 1 element left we don't get undef my $cmp = $target cmp $arref->[$midpoint]; # make the comparison onc +e return $midpoint unless $cmp; # We've found it! return -1 if $len == 1; # Must not be there return binsearch ([$arref->[0..$midpoint],$target) if $cmp == -1; # +Its in the lower half, check to the midpoint because its rounded down return binsearch ([$arref->[$midpoint+1..$len-1],$target) if $cmp == + 1; # Its in the upper half, check to the midpoint +1 for reason stat +ed above croak "Something really weird happened. Might want to run make test +on perl itsself."; }
This will still take time with 800k elements. If speed is really an issue consider PDL or doing it in C.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: performance issues with grepping large sorted arrays
by ikegami (Patriarch) on Jul 24, 2007 at 02:08 UTC | |
by mr_mischief (Monsignor) on Jul 24, 2007 at 16:34 UTC |