comment on

All,
Here are the preliminary results:

$ perl benchmark.pl
             Rate hardburn_1 L~R_3 L~R_2 tachyon hardburn_2 delirium k
+vale  tye  L~R BrowserUk
hardburn_1 2.15/s         --   -8%  -16%    -17%       -18%     -19%  
+-22% -23% -23%      -90%
L~R_3      2.35/s         9%    --   -8%     -9%       -11%     -12%  
+-15% -16% -16%      -90%
L~R_2      2.56/s        19%    9%    --     -1%        -3%      -4%  
+ -7%  -8%  -8%      -89%
tachyon    2.58/s        20%   10%    1%      --        -2%      -3%  
+ -6%  -7%  -7%      -89%
hardburn_2 2.63/s        22%   12%    3%      2%         --      -1%  
+ -5%  -6%  -6%      -88%
delirium   2.67/s        24%   14%    4%      3%         1%       --  
+ -3%  -4%  -4%      -88%
kvale      2.76/s        28%   17%    8%      7%         5%       3%  
+  --  -1%  -1%      -88%
tye        2.78/s        29%   19%    9%      8%         6%       4%  
+  1%   --  -0%      -88%
L~R        2.79/s        30%   19%    9%      8%         6%       5%  
+  1%   0%   --      -88%
BrowserUk  22.5/s       944%  857%  778%    770%       755%     743%  
+715% 708% 705%        --
[download]

Here is the script I used to build the list:

#!/usr/bin/perl
use strict;
use warnings;

open (LIST, '>', "list.rnd") or die "Unable to open list.rnd : $!";

for ( 1 .. 10_000 ) {
    my $length = int(rand 240) + 10;
    my $string = "";
    $string .= ('a' .. 'z')[ rand 26] while length $string < $length;
    print LIST "$string\n";
}
[download]

And finally, here is the benchmark itself

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark 'cmpthese';


my @list;
open (LIST, '<', 'list.rnd') or die "Unable to open list.rnd for readi
+ng : !";
while ( <LIST> ) {
    chomp;
    push @list, $_;
}

my $length = int(rand 240) + 10;
my $str_a = "";
$str_a .= ('a' .. 'z')[ rand 26] while length $str_a < $length;

sub expensive_function {
    my ($str_a, $str_b) = @_;
    my $foo;
    for ( split // , $str_a . $str_b) {
        $foo++;
    }
}

cmpthese -5, {
    'tachyon' => sub {
        use Inline C =>;
        for my $str_b ( @list ) {
            next if ! same_scan( $str_a, $str_b );
            expensive_function ( $str_a, $str_b );
        }
    },
    'L~R' => sub {
        for my $str_b ( @list ) {
            next if index($str_a, substr($str_b, 0, 1));
            expensive_function ( $str_a, $str_b );
        }
    },
    'hardburn_1' => sub {
        my ($a_1st, $rest) = split '', $str_a, 2;
        for my $str_b ( @list ) {
            my ($b_1st, $rest) = split '', $str_b, 2;
            next if $b_1st ne $a_1st;
            expensive_function ( $str_a, $str_b );
        }
    },
    'hardburn_2' => sub {
        my $rev_a = reverse $str_a;
        my $a_1st = chop $rev_a;
        for my $str_b ( @list ) {
            my $rev_b = reverse $str_b;
            my $b_1st = chop $rev_b;
            next if $b_1st ne $a_1st;
            expensive_function ( $str_a, $str_b );
        }
    },
    'delirium' => sub {
        my $fc = substr($str_a,0,1);
        for my $str_b ( @list ) {
            next if $str_b !~ /^$fc/;
            expensive_function ( $str_a, $str_b );
        }
    },
    'BrowserUk' => sub {
        my %list;
        @list{ map{ substr $_, 0, 1 } @list } = ();
        my $fc = substr $str_a, 0, 1;
        for my $str_b ( @list ) {
            next if exists $list{ $fc };
            expensive_function ( $str_a, $str_b );
        }
    },
    'tye' => sub {
        for my $str_b ( @list ) {
            next if ord( $str_a ) != ord( $str_b );
            expensive_function ( $str_a, $str_b );
        }
    },
    'kvale' => sub {
        my $fc = substr $str_a, 0, 1;
        for my $str_b ( @list ) {
            next if $fc ne substr $str_b, 0, 1;
            expensive_function ( $str_a, $str_b );
        }
    },
    'L~R_2' => sub {
        my $fc = unpack "a1" , $str_a;
        for my $str_b ( @list ) {
            next if $fc ne unpack "a1", $str_b;
            expensive_function ( $str_a, $str_b );
        }
    },
    'L~R_3' => sub {
        my ($fc_a) = $str_a =~ /^(.)/;
        for my $str_b ( @list ) {
            my ($fc_b) = $str_b =~ /^(.)/;
            next if $fc_a ne $fc_b;
            expensive_function ( $str_a, $str_b );
        }
    },
};
__END__
__C__
int same_scan(char* str1, char* str2)
{
  return str1[0] == str2[0] ? 1 : 0;
}
[download]

Note: I have not had a chance to verify that each one of these tests is calling expensive_function the same number of times so please check back if you are interested. I need to head home now.

Cheers - L~R

Update: See updated benchmark.

In reply to Re: Matching First Character of Strings Efficiently by Limbic~Region
in thread Matching First Character of Strings Efficiently by Limbic~Region

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.