Binary search algorithm.

Replies are listed 'Best First'.

Re: Binary search algorithm.
by ikegami (Patriarch) on Aug 13, 2011 at 04:56 UTC

The variant below has two features yours doesn't:

It allows one to use any compare function, not just cmp.
It returns the index at which the element should be found when it is not found.

# Add $value to sorted @array, if it's not already there.
my $idx = binsearch { $a <=> $b } $value, @array;
splice(@array, ~$idx, 0, $value) if $idx < 0;
[download]

sub binsearch(&$\@) {
   my  $compare = $_[0];
   #my $value   = $_[1];
   my  $array   = $_[2];

   my $i = 0;
   my $j = $#$array;
   return $j if $j == -1;

   my $ap = do { no strict 'refs'; \*{caller().'::a'} };  local *$ap;
   my $bp = do { no strict 'refs'; \*{caller().'::b'} };  local *$bp;

   *$ap = \($_[1]);
   for (;;) {
      my $k = int(($i+$j)/2);
      *$bp = \($array->[$k]);

      my $cmp = $compare->()
         or return $k;

      if ($cmp < 0) {
         $j = $k-1;
         return _unsigned_to_signed(~$k) if $i > $j;
      } else {
         $i = $k+1;
         return _unsigned_to_signed(~$i) if $i > $j;
      }
   }
}

sub _unsigned_to_signed { unpack('j', pack('J', $_[0])) }
[download]

I wonder if the following interface is better:

# Add $value to sorted @array, if it's not already there.
my $idx = binsearch { $value <=> $::_ } @array;
splice(@array, ~$idx, 0, $value) if $idx < 0;
[download]

sub binsearch(&\@) {
   my ($compare, $array) = @_;

   my $i = 0;
   my $j = $#$array;
   return $j if $j == -1;

   for (;;) {
      my $k = int(($i+$j)/2);

      my $cmp;
      $cmp = $compare->() for $array->[$k];
      return $k if !$cmp;

      if ($cmp < 0) {
         $j = $k-1;
         return _unsigned_to_signed(~$k) if $i > $j;
      } else {
         $i = $k+1;
         return _unsigned_to_signed(~$i) if $i > $j;
      }
   }
}

sub _unsigned_to_signed { unpack('j', pack('J', $_[0])) }
[download]

[reply]
[d/l]
[select]

Re: Binary search algorithm.
by afoken (Chancellor) on Aug 13, 2011 at 05:11 UTC

"my" variable $lo masks earlier declaration in same scope at foo.pl li
+ne 14.
"my" variable $hi masks earlier declaration in same scope at foo.pl li
+ne 14.
"my" variable @array masks earlier declaration in same scope at foo.pl
+ line 21.
"my" variable $try masks earlier declaration in same scope at foo.pl l
+ine 21.
"my" variable $word masks earlier declaration in same scope at foo.pl 
+line 21.
"my" variable @a masks earlier declaration in same scope at foo.pl lin
+e 33.
syntax error at foo.pl line 12, near "<= :"
syntax error at foo.pl line 30, near "}"
Execution of foo.pl aborted due to compilation errors.
[download]

Untested, right? After removing the obvious vi command in line 12, I still get

"my" variable @a masks earlier declaration in same scope at foo.pl lin
+e 32.
[download]

Please don't post untested code.

use lib '/home/kaiyin/perl5/mylib'; Why? You don't load any non-core modules.
Why do you pass the searched word by reference?
Why don't you allow numeric compare?
Why don't you place your code in a reusable, testable module?
And finally, why don't you just use grep? At least with your trivial example, it is significantly faster.

I modified your code slightly, to show the speed differences.

Relevant NYTProf output:

Line State
ments Time
on line Calls Time
in subs Code

37 1 12µs for (1..100000) {

38 100000 928ms 100000 4.23s $result=BinSearch(\@a, \$w);
# spent 4.23s making 100000 calls to main::BinSearch, avg 42µs/call

39 100000 1.46s 100000 728ms say $result;
# spent 728ms making 100000 calls to main::CORE:say, avg 7µs/call

40 100000 1.43s ($result)=grep { $a$_ eq $w } 0..$#a;

41 100000 1.65s 100000 673ms say $result;
# spent 673ms making 100000 calls to main::CORE:say, avg 7µs/call

42 }

Modified code:

my @a = qw(format type ascii hex pos len binary search perl unix eof a
+rray word);
@a = sort @a;
my $w = "len";
say "@a";
my $result;
open STDOUT,'>','/dev/null';
for (1..100000) {
    $result=BinSearch(\@a, \$w);
    say $result;
    ($result)=grep { $a[$_] eq $w } 0..$#a;
    say $result;
}
[download]

Even with 10,000 pieces of junk in front of the searched element, grep still beats your BinSearch easily:

Relevant NYTProf output:

Line State
ments Time
on line Calls Time
in subs Code

38 1 1.66ms for (1..1000) {

39 1000 24.1ms 1000 14.4s $result=BinSearch(\@a, \$w);
# spent 14.4s making 1000 calls to main::BinSearch, avg 14.4ms/call

40 1000 46.9ms 1000 34.3ms say $result;
# spent 34.3ms making 1000 calls to main::CORE:say, avg 34µs/call

41 1000 6.35s ($result)=grep { $a$_ eq $w } 0..$#a;

42 1000 62.3ms 1000 29.4ms say $result;
# spent 29.4ms making 1000 calls to main::CORE:say, avg 29µs/call

Modified code:

my @a = ('00junk') x 10000;
push @a,qw(format type ascii hex pos len binary search perl unix eof a
+rray word);
@a = sort @a;
my $w = "len";
say "@a";
my $result;
open STDOUT,'>','/dev/null';
for (1..1000) {
    $result=BinSearch(\@a, \$w);
    say $result;
    ($result)=grep { $a[$_] eq $w } 0..$#a;
    say $result;
}
[download]

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

[reply]
[d/l]
[select]

Re^2: Binary search algorithm.

by jwkrahn (Abbot) on Aug 13, 2011 at 05:33 UTC

And finally, why don't you just use grep? At least with your trivial example, it is significantly faster.

You are comparing apples to oranges.

grep is always O( N ) and will work whether the data is sorted or not.

Binary Search has a best case of O( 1 ) and a worst case of O( log N ) and will only work if the data is sorted.

[reply]

Re^3: Binary search algorithm.

by ikegami (Patriarch) on Aug 13, 2011 at 06:00 UTC

Actually, you are. The formula you are giving says how the speed of the algorithms *scale*, whereas afoken was talking about how fast they are.

Although you do bring up a good point: afoken only measured the worst case.

[reply]

Re: Binary search algorithm.
by jwkrahn (Abbot) on Aug 13, 2011 at 05:14 UTC

my @array = @$arrayref; my $word = $$wordref;
[download]

Why copy all this data? Why not just use $arrayref $wordref directly?

while ($lo <= :w $hi) {
[download]

What is the meaning of :w in the code? That looks like a syntax error.

$lo = ++$try; # without increment there will be ... $hi = --$try;
[download]

Why are you modifying $try? It always gets assigned from int(($lo + $hi)/2) at the top of the loop.

if ($array[$try] lt $word) { $lo = ++$try; # without increment there will be # a dead loop in the 4th case, # see the note at the bottem next } elsif ($array[$try] gt $word) { $hi = --$try; next } else { return $try }
[download]

The use of next is superfluous because of the way that if elsif else works.

[reply]
[d/l]
[select]

Re^2: Binary search algorithm.

by ikegami (Patriarch) on Aug 13, 2011 at 05:18 UTC