Re: A better way of lookup?

Here's a solution that uses List::BinarySearch::XS. You could just use List::BinarySearch, since installing it will pull in the XS version and use it automatically, but I included ::XS to be explicit.

You would have to benchmark this in a tight loop. One optimization would be to not pass the $h array ref in to bsearch_lookup. And you could probably come up with a more efficient code block to pass in to the binsearch_pos function, but it's a start. I think that for large crossreference lookup sets, this will beat a linear search.

use List::BinarySearch::XS 'binsearch_pos';

my @lookup_values = (
    [ 25000,        2500        ],
    [ 50000,        5000        ],
    [ 150000,       12500       ],
    [ 225000,       25000       ],
    [ 300000,       37500       ],
    [ 600000,       60000       ],
    [ 1200000,      120000      ],
    [ 3600000,      300000      ],
    [ 5400000,      600000      ],
    [ 10800000,     900000      ],
    [ 21600000,     1800000     ],
    [ 43200000,     3600000     ],
    [ 64800000,     7200000     ],
    [ 129600000,    10800000    ],
    [ 216000000,    21600000    ],
    [ 432000000,    43200000    ],
    [ 864000000,    86400000    ],
    [ 1728000000,   172800000   ],
    [ 3024000000,   345600000   ],
    [ 6048000000,   604800000   ],
    [ 12096000000,  1209600000  ],
    [ 31557600000,  2629800000  ],
    [ 63115200000,  5259600000  ],
    [ 78894000000,  7889400000  ],
    [ 157788000000, 15778800000 ],
);

sub bsearch_lookup {
    my( $n, $h ) = @_;
    my $pos = binsearch_pos { $a+1 <=> $b->[0] } $n, @$h;
    $pos > $#$h && return 31557600000;
    return $h->[$pos][1];
}

my @tests = ( 100, 25000, 50000, ( map { int rand 160000000000 } 1 .. 
+10 ), 160000000000 ); # Make sure we have some edge cases
foreach my $test ( @tests ) {
    my $bmatch = bsearch_lookup( $test, \@lookup_values );
    print "Bsearch Found : $test => $bmatch\n\n";
[download]

If you want to stick with a linear search approach you could probably use List::Utils first. That would also pull most of the work into an XS-based routine, but of course it would be a linear search.

Update: For the small list of thresholds that you provided, it's hard to beat the simple search. I haven't tested with a larger list of thresholds, mostly because I'm not enthusiastic about generating the larger list. Here are results for three possible solutions given your list of 25 (well, really 26 if you count the default return value) thresholds:

          Rate   first bsearch  simple
first   33.8/s      --    -25%    -41%
bsearch 44.9/s     33%      --    -21%
simple  57.1/s     69%     27%      --
[download]

And here is the code that generated that report:

use Benchmark 'cmpthese';
use List::BinarySearch::XS 'binsearch_pos';
use List::Util 'first';

my @lookup_values = (
    [ 25000,        2500        ],
    [ 50000,        5000        ],
    [ 150000,       12500       ],
    [ 225000,       25000       ],
    [ 300000,       37500       ],
    [ 600000,       60000       ],
    [ 1200000,      120000      ],
    [ 3600000,      300000      ],
    [ 5400000,      600000      ],
    [ 10800000,     900000      ],
    [ 21600000,     1800000     ],
    [ 43200000,     3600000     ],
    [ 64800000,     7200000     ],
    [ 129600000,    10800000    ],
    [ 216000000,    21600000    ],
    [ 432000000,    43200000    ],
    [ 864000000,    86400000    ],
    [ 1728000000,   172800000   ],
    [ 3024000000,   345600000   ],
    [ 6048000000,   604800000   ],
    [ 12096000000,  1209600000  ],
    [ 31557600000,  2629800000  ],
    [ 63115200000,  5259600000  ],
    [ 78894000000,  7889400000  ],
    [ 157788000000, 15778800000 ],
);  

my @tests = ( 0, 100, 25000, 50000, ( map { int rand 160000000000 } 1 
+.. 10000 ), 160000000000 );

sub simple_lookup {
    my( $n, $h ) = @_;
    $n < $_->[0] && return $_->[1] for @$h;
    return 31557600000;
}   

sub bsearch_lookup {
    my( $n, $h ) = @_;
    my $pos = binsearch_pos { $a <=> $b->[0] } $n+1, @$h;
    $pos > $#$h && return 31557600000;
    return $h->[$pos][1]; 
}   

sub first_lookup {
    my( $n, $h ) = @_;
    my $res = first {  $n < $_->[0] } @$h;
    return ref $res ? $res->[1] : 31557600000;
}   

cmpthese( -5, {
    simple  => sub { my @result = map{ bsearch_lookup( $_, \@lookup_va
+lues ) } @tests; },
    bsearch => sub { my @result = map{ simple_lookup ( $_, \@lookup_va
+lues ) } @tests; },
    first   => sub { my @result = map{ first_lookup  ( $_, \@lookup_va
+lues ) } @tests; },
});
[download]

For the worst case, both the 'simple' and 'first' lookups will need to do 25 threshold comparisons, whereas the 'bsearch' lookup will do about 5. That tradeoff isn't worthwhile for small lookup tables. If your lookup table were to grow to, for example, 1000, the 'simple' and 'first' lookups will require 1000 comparisons in the worst case, and the 'bsearch' method will require about 10 (you know all this, of course; I'm just thinking out loud). The advantage in that case would probably lean toward the 'bsearch' method. It would be interesting to know at what point that method becomes worthwhile.

Dave

Comment on Re: A better way of lookup? Select or Download Code