If array A is sorted by using the string cmp operator you can employ the binary search algorithm to determine if $item is in the array.

The basic idea is to see if the item is greater than or less than the middle item in the array, if its greater, binary search the top half of the array. If its less, binary search the bottom half. You'll eventually narrow it down to a one element array or hit it along the way at a midpoint. The runtime of this algorithm is O(lg n), versus the O(n) of using grep

A recursive implementation follows:

# Takes an array ref and a target item # Returns the index of the item in the array or # -1 if its not there. sub binsearch { my $arref = shift; my $target = shift; my $len = scalar @$arref; my $midpoint = floor( $len / 2); # Get the center, rounding down whe +n we have 1 element left we don't get undef my $cmp = $target cmp $arref->[$midpoint]; # make the comparison onc +e return $midpoint unless $cmp; # We've found it! return -1 if $len == 1; # Must not be there return binsearch ([$arref->[0..$midpoint],$target) if $cmp == -1; # +Its in the lower half, check to the midpoint because its rounded down return binsearch ([$arref->[$midpoint+1..$len-1],$target) if $cmp == + 1; # Its in the upper half, check to the midpoint +1 for reason stat +ed above croak "Something really weird happened. Might want to run make test +on perl itsself."; }

This will still take time with 800k elements. If speed is really an issue consider PDL or doing it in C.


In reply to Re: performance issues with grepping large sorted arrays by Trizor
in thread performance issues with grepping large sorted arrays by HarshaHegde

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.