Perl-Sensitive Sunglasses PerlMonks

### comment on

 Need Help??

I would think a binary search of the array would be a lot more efficient than a linear pass when n=500,000, which might take on average 250,000 comparisons to locate the range.

pickleswarlz, If the term 'binary search' scares you, there is an exercise in "Learning Perl" where you write a program that picks a number between 0-100 and then invites a user to guess the number. If the user starts off by picking 50, then the program should respond by saying 'higher' or 'lower'. What is the quickest way to guess the number that the program picked? Well, if the computer says higher, you should pick 75 next. Then if the computer says lower, you should pick 62 or 63 next. Do you see how halving the range each time will arrive at guessing the correct number the quickest? Try it with a friend. It will only take you 4-5 tries to guess the number if each time you guess the middle of the range that you know contains the number. That process is known as a 'binary search'.

The function below uses a binary search to return the (lowest) numbers in an array that bracket a specified number. Once you have all your ID's in a sorted array, you can search the array for the two numbers that bracket a given number:

```
sub binary_search {
#Called like this:
#binary_search(\@data, \$#data, \$num);

my (\$aref, \$high_index, \$target) = @_;
my \$low_index = 0;

if ( \$target < \$aref->[\$low_index]
or \$target > \$aref->[\$high_index] ) {

die "Target not in range."
}

my (\$middle_index, \$middle_value);

while (\$low_index <= \$high_index) {

\$middle_index = int( (\$high_index + \$low_index)/2 );
\$middle_value = \$aref->[\$middle_index];
print '.';  #print a dot for each attempt to locate target

if (\$middle_value < \$target) {
\$low_index = \$middle_index + 1;
}
elsif (\$middle_value > \$target) {
\$high_index = \$middle_index - 1;
}
elsif (\$middle_value == \$target) {
print "*";   #Indicates the target is actually in @data

return \$middle_index == 0
? (\$middle_value, \$aref->[1])
: (\$aref->[\$middle_index-1], \$middle_value)
;
}
}

while (\$aref->[\$low_index] >= \$target) {
--\$low_index;
}

return \$aref->[\$low_index], \$aref->[\$low_index+1];

}

say '';

my @data = (1, 5, 6, 7, 8, 9, 10, 20, 30, 33, 34, 38, 41, 100, 101, 10
+2, 110);

for my \$num (1 .. \$data[-1]) {
my @results = binary_search(\@data, \$#data, \$num);
say "\$num: @results";
}

--output:--
....*1: 1 5
....2: 1 5
....3: 1 5
....4: 1 5
...*5: 1 5
....*6: 5 6
..*7: 6 7
....*8: 7 8
...*9: 8 9
....*10: 9 10
.....11: 10 20
.....12: 10 20
.....13: 10 20
.....14: 10 20
.....15: 10 20
.....16: 10 20
.....17: 10 20
.....18: 10 20
.....19: 10 20
.....*20: 10 20
.....21: 20 30
.....22: 20 30
.....23: 20 30
.....24: 20 30
.....25: 20 30
.....26: 20 30
.....27: 20 30
.....28: 20 30
.....29: 20 30
.*30: 20 30
....31: 30 33
....32: 30 33
....*33: 30 33
...*34: 33 34
....35: 34 38
....36: 34 38
....37: 34 38
....*38: 34 38
....39: 38 41
....40: 38 41
..*41: 38 41
....42: 41 100
....43: 41 100
....44: 41 100
....45: 41 100
....46: 41 100
....47: 41 100
....48: 41 100
....49: 41 100
....50: 41 100
....51: 41 100
....52: 41 100
....53: 41 100
....54: 41 100
....55: 41 100
....56: 41 100
....57: 41 100
....58: 41 100
....59: 41 100
....60: 41 100
....61: 41 100
....62: 41 100
....63: 41 100
....64: 41 100
....65: 41 100
....66: 41 100
....67: 41 100
....68: 41 100
....69: 41 100
....70: 41 100
....71: 41 100
....72: 41 100
....73: 41 100
....74: 41 100
....75: 41 100
....76: 41 100
....77: 41 100
....78: 41 100
....79: 41 100
....80: 41 100
....81: 41 100
....82: 41 100
....83: 41 100
....84: 41 100
....85: 41 100
....86: 41 100
....87: 41 100
....88: 41 100
....89: 41 100
....90: 41 100
....91: 41 100
....92: 41 100
....93: 41 100
....94: 41 100
....95: 41 100
....96: 41 100
....97: 41 100
....98: 41 100
....99: 41 100
....*100: 41 100
...*101: 100 101
....*102: 101 102
.....103: 102 110
.....104: 102 110
.....105: 102 110
.....106: 102 110
.....107: 102 110
.....108: 102 110
.....109: 102 110
.....*110: 102 110

And after reading up on the cpan module List::BinarySearch, whose author appears to really know how to construct efficient search algorithms, I would consider using that module's bsearch_num_pos() function, which returns the optimal insertion index for a given number:

``` # The following [function returns] an optimal insert point if no match.

my @num_array =   ( 100, 200, 300, 400 );

\$index = bsearch_num_pos 500, @num_array; # Returns 4 - Best insert-at position.
```

Given the index for the insertion, you can get the range the number falls in with \$data[\$index-1]and \$data[\$index]. You just have to adjust for the case when the insertion point is 0, like I did in my code.

```

use List::BinarySearch qw(bsearch_num_pos);
...
...

sub cpan_binary_search {

my (\$aref, \$high_index, \$target) =  @_;
my \$low_index = 0;

if ( \$target < \$aref->[\$low_index]
or \$target > \$aref->[\$high_index] ) {

die "Target not in range."
}

my \$index = bsearch_num_pos(\$target, @\$aref);

return \$index == 0 ? (\$aref->[0], \$aref->[1])
: (\$aref->[\$index-1], \$aref->[\$index]);

}

my @data = (1, 5, 6, 7, 8, 9, 10, 20, 30, 33, 34, 38, 41, 100, 101, 10
+2, 110);

for my \$num (1 .. \$data[-1]) {
my @results = cpan_binary_search(\@data, \$#data, \$num);
say "\$num: @results";
}

I'll have to run this Perl script on multiple files, so I'd love to be able to specify the input and output files in the command line as I go rather than specifying them in the script itself.

Any command line arguments can be found in the @ARGV array:

```use strict;
use warnings;
use 5.012;

say for @ARGV;

say "\n";

my (\$in, \$out) = @ARGV;
say \$in;
say \$out;

--output:--
\$  perl prog1.pl infile_name outfile_name 10 20
infile_name
outfile_name
10
20

infile_name
outfile_name

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

• Are you posting in the right place? Check out Where do I post X? to know for sure.
• Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
• Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
• Want more info? How to link or How to display code and escape characters are good places to start.

Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2022-12-06 23:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found

Notices?