#!/usr/bin/perl use strict; use warnings; use 5.010; use lib '/home/kaiyin/perl5/mylib'; sub BinSearch { my ($arrayref, $wordref) = @_; my @array = @$arrayref; my $word = $$wordref; my ($lo, $hi) = (0, $#array); while ($lo <= :w $hi) { my $try = int(($lo + $hi)/2); if ($array[$try] lt $word) { $lo = ++$try; # without increment there will be # a dead loop in the 4th case, # see the note at the bottem next } elsif ($array[$try] gt $word) { $hi = --$try; next } else { return $try } } return; } my @a = qw(format type ascii hex pos len binary search perl unix eof a +rray word); my @a = sort @a; my $w = "len"; say "@a"; say BinSearch(\@a, \$w); # There are only 5 cases to consider: # * - - # - * - trivial # - - * # - * # * - trivial # # Among which 2 are trivial, that is to say # $array[$try] will immediately match $word

Binary search algorithm.

Replies are listed 'Best First'.
Re: Binary search algorithm.
by ikegami (Patriarch) on Aug 13, 2011 at 04:56 UTC

    The variant below has two features yours doesn't:

    • It allows one to use any compare function, not just cmp.
    • It returns the index at which the element should be found when it is not found.
    # Add $value to sorted @array, if it's not already there. my $idx = binsearch { $a <=> $b } $value, @array; splice(@array, ~$idx, 0, $value) if $idx < 0;
    sub binsearch(&$\@) { my $compare = $_[0]; #my $value = $_[1]; my $array = $_[2]; my $i = 0; my $j = $#$array; return $j if $j == -1; my $ap = do { no strict 'refs'; \*{caller().'::a'} }; local *$ap; my $bp = do { no strict 'refs'; \*{caller().'::b'} }; local *$bp; *$ap = \($_[1]); for (;;) { my $k = int(($i+$j)/2); *$bp = \($array->[$k]); my $cmp = $compare->() or return $k; if ($cmp < 0) { $j = $k-1; return _unsigned_to_signed(~$k) if $i > $j; } else { $i = $k+1; return _unsigned_to_signed(~$i) if $i > $j; } } } sub _unsigned_to_signed { unpack('j', pack('J', $_[0])) }

    I wonder if the following interface is better:

    # Add $value to sorted @array, if it's not already there. my $idx = binsearch { $value <=> $::_ } @array; splice(@array, ~$idx, 0, $value) if $idx < 0;
    sub binsearch(&\@) { my ($compare, $array) = @_; my $i = 0; my $j = $#$array; return $j if $j == -1; for (;;) { my $k = int(($i+$j)/2); my $cmp; $cmp = $compare->() for $array->[$k]; return $k if !$cmp; if ($cmp < 0) { $j = $k-1; return _unsigned_to_signed(~$k) if $i > $j; } else { $i = $k+1; return _unsigned_to_signed(~$i) if $i > $j; } } } sub _unsigned_to_signed { unpack('j', pack('J', $_[0])) }
Re: Binary search algorithm.
by afoken (Chancellor) on Aug 13, 2011 at 05:11 UTC
    • "my" variable $lo masks earlier declaration in same scope at foo.pl li +ne 14. "my" variable $hi masks earlier declaration in same scope at foo.pl li +ne 14. "my" variable @array masks earlier declaration in same scope at foo.pl + line 21. "my" variable $try masks earlier declaration in same scope at foo.pl l +ine 21. "my" variable $word masks earlier declaration in same scope at foo.pl +line 21. "my" variable @a masks earlier declaration in same scope at foo.pl lin +e 33. syntax error at foo.pl line 12, near "<= :" syntax error at foo.pl line 30, near "}" Execution of foo.pl aborted due to compilation errors.

      Untested, right? After removing the obvious vi command in line 12, I still get

      "my" variable @a masks earlier declaration in same scope at foo.pl lin +e 32.

      Please don't post untested code.

    • use lib '/home/kaiyin/perl5/mylib'; Why? You don't load any non-core modules.
    • Why do you pass the searched word by reference?
    • Why don't you allow numeric compare?
    • Why don't you place your code in a reusable, testable module?
    • And finally, why don't you just use grep? At least with your trivial example, it is significantly faster.

    I modified your code slightly, to show the speed differences.

    Relevant NYTProf output:

    Line State
    ments
    Time
    on line
    Calls Time
    in subs
    Code
    37112µsfor (1..100000) {
    38100000928ms1000004.23s $result=BinSearch(\@a, \$w);
    # spent 4.23s making 100000 calls to main::BinSearch, avg 42µs/call
    391000001.46s100000728ms say $result;
    # spent 728ms making 100000 calls to main::CORE:say, avg 7µs/call
    401000001.43s ($result)=grep { $a$_ eq $w } 0..$#a;
    411000001.65s100000673ms say $result;
    # spent 673ms making 100000 calls to main::CORE:say, avg 7µs/call
    42}

    Modified code:

    my @a = qw(format type ascii hex pos len binary search perl unix eof a +rray word); @a = sort @a; my $w = "len"; say "@a"; my $result; open STDOUT,'>','/dev/null'; for (1..100000) { $result=BinSearch(\@a, \$w); say $result; ($result)=grep { $a[$_] eq $w } 0..$#a; say $result; }

    Even with 10,000 pieces of junk in front of the searched element, grep still beats your BinSearch easily:

    Relevant NYTProf output:

    Line State
    ments
    Time
    on line
    Calls Time
    in subs
    Code
    3811.66msfor (1..1000) {
    39100024.1ms100014.4s $result=BinSearch(\@a, \$w);
    # spent 14.4s making 1000 calls to main::BinSearch, avg 14.4ms/call
    40100046.9ms100034.3ms say $result;
    # spent 34.3ms making 1000 calls to main::CORE:say, avg 34µs/call
    4110006.35s ($result)=grep { $a$_ eq $w } 0..$#a;
    42100062.3ms100029.4ms say $result;
    # spent 29.4ms making 1000 calls to main::CORE:say, avg 29µs/call

    Modified code:

    my @a = ('00junk') x 10000; push @a,qw(format type ascii hex pos len binary search perl unix eof a +rray word); @a = sort @a; my $w = "len"; say "@a"; my $result; open STDOUT,'>','/dev/null'; for (1..1000) { $result=BinSearch(\@a, \$w); say $result; ($result)=grep { $a[$_] eq $w } 0..$#a; say $result; }

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      And finally, why don't you just use grep? At least with your trivial example, it is significantly faster.

      You are comparing apples to oranges.

      grep is always O( N ) and will work whether the data is sorted or not.

      Binary Search has a best case of O( 1 ) and a worst case of O( log N ) and will only work if the data is sorted.

        Actually, you are. The formula you are giving says how the speed of the algorithms *scale*, whereas afoken was talking about how fast they are.

        Although you do bring up a good point: afoken only measured the worst case.

Re: Binary search algorithm.
by jwkrahn (Abbot) on Aug 13, 2011 at 05:14 UTC
    my @array = @$arrayref; my $word = $$wordref;

    Why copy all this data?    Why not just use $arrayref $wordref directly?



    while ($lo <= :w $hi) {

    What is the meaning of :w in the code?    That looks like a syntax error.



    $lo = ++$try; # without increment there will be ... $hi = --$try;

    Why are you modifying $try?    It always gets assigned from int(($lo + $hi)/2) at the top of the loop.



    if ($array[$try] lt $word) { $lo = ++$try; # without increment there will be # a dead loop in the 4th case, # see the note at the bottem next } elsif ($array[$try] gt $word) { $hi = --$try; next } else { return $try }

    The use of next is superfluous because of the way that if elsif else works.

      What is the meaning of :w in the code?

      :w is vi's save ("write") command. He must have tried to save while in edit mode.

      Why are you modifying $try? It always gets assigned from int(($lo + $hi)/2) at the top of the loop.

      Yes, but $hi or $lo changes in every loop pass.

        Yes, but $hi or $lo changes in every loop pass.

        Yes but that doesn't mean you have to modify $try.