As more food for thought, I'll present an application of binary search
where it
doesn't make sense to 'use a hash'.
Suppose you're looking up records in a large sorted file.
The following
search subroutine will search the file efficiently,
using a binary search. It gets two arguments:
$fh is a filehandle
open to the file you want to search, and
$cmp is
a comparison function.
search will call
$cmp
repeatedly with various records from the file;
$cmp should return zero if its argument is the desired record,
negative if its argument is earlier than the desired record,
and positive if its argument is later than the desired record.
search then returns the first record in the file that is
not earlier than the desired record, and leaves the filehandle
positioned at that record. If there is no such record, it
returns undef.
sub search {
my ($fh, $cmp) = @_;
my ($lo, $hi) = (0, -s $fh);
while (1) {
my $mid = int(($lo + $hi)/2);
if ($mid) {
seek $fh, $mid-1, SEEK_SET;
my $junk = <$fh>;
} else { seek $fh , $mid, SEEK_SET; }
my $start = tell $fh;
my $rec = <$fh>;
return unless defined $rec;
chomp($rec);
if ($hi == $lo) {
seek $fh, $start, SEEK_SET;
return $rec
};
local $_ = $rec;
if ($cmp->($rec) < 0) { $lo = $mid+1 }
else { $hi = $mid }
}
}
For example, suppose you want to search the dictionary file
for words that begin with
foo. You might:
open DICT, "<", "/usr/dict/Web2" or die...;
if (search(\*DICT, sub { lc $_ cmp "foo" })) {
# DICT is now positioned at the first word beginning with 'foo'
while (<DICT>) {
last unless /^foo/i;
print;
}
}
The
search call moves the filehandle quickly to the
first record that begins with
foo, and then the
while loop prints out the words, quitting after the last one.
Using
search is quicker than scanning the file from the beginning
looking for the first
foo word, unless the file
is very small.
The behavior is similar to that of the standard Search::Dict
module, but the code is much simpler, and I think my interface is more flexible.
I wrote this as an example for my new class
Strategies for Lightweight Databases.
Share and enjoy!
--
Mark Dominus
Perl Paraphernalia