comment on

As more food for thought, I'll present an application of binary search where it doesn't make sense to 'use a hash'. Suppose you're looking up records in a large sorted file. The following search subroutine will search the file efficiently, using a binary search. It gets two arguments: $fh is a filehandle open to the file you want to search, and $cmp is a comparison function. search will call $cmp repeatedly with various records from the file; $cmp should return zero if its argument is the desired record, negative if its argument is earlier than the desired record, and positive if its argument is later than the desired record.

search then returns the first record in the file that is not earlier than the desired record, and leaves the filehandle positioned at that record. If there is no such record, it returns undef.

sub search {
  my ($fh, $cmp) = @_;
  my ($lo, $hi) = (0, -s $fh);

  while (1) {
    my $mid = int(($lo + $hi)/2);
    if ($mid) {
      seek $fh, $mid-1, SEEK_SET;
      my $junk = <$fh>;
    } else { seek $fh , $mid, SEEK_SET; }
    my $start = tell $fh;
    my $rec = <$fh>;
    return unless defined $rec;
    chomp($rec);

    if ($hi == $lo) { 
      seek $fh, $start, SEEK_SET;
      return $rec 
    };
    local $_ = $rec;
    if ($cmp->($rec) < 0) { $lo = $mid+1 }
    else                  { $hi = $mid   }
  }
}
[download]

For example, suppose you want to search the dictionary file for words that begin with foo. You might:

  open DICT, "<", "/usr/dict/Web2" or die...;
  if (search(\*DICT, sub { lc $_ cmp "foo" })) {
    # DICT is now positioned at the first word beginning with 'foo'
    while (<DICT>) {
      last unless /^foo/i;
      print;
    }
  }
[download]

The search call moves the filehandle quickly to the first record that begins with foo, and then the while loop prints out the words, quitting after the last one. Using search is quicker than scanning the file from the beginning looking for the first foo word, unless the file is very small.

The behavior is similar to that of the standard Search::Dict module, but the code is much simpler, and I think my interface is more flexible. I wrote this as an example for my new class Strategies for Lightweight Databases. Share and enjoy!

--
Mark Dominus
Perl Paraphernalia

In reply to Re: Binary searching with Perl. by Dominus
in thread Binary searching with Perl. by DigitalKitty

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.