All,
Here are the preliminary results:
$ perl benchmark.pl Rate hardburn_1 L~R_3 L~R_2 tachyon hardburn_2 delirium k +vale tye L~R BrowserUk hardburn_1 2.15/s -- -8% -16% -17% -18% -19% +-22% -23% -23% -90% L~R_3 2.35/s 9% -- -8% -9% -11% -12% +-15% -16% -16% -90% L~R_2 2.56/s 19% 9% -- -1% -3% -4% + -7% -8% -8% -89% tachyon 2.58/s 20% 10% 1% -- -2% -3% + -6% -7% -7% -89% hardburn_2 2.63/s 22% 12% 3% 2% -- -1% + -5% -6% -6% -88% delirium 2.67/s 24% 14% 4% 3% 1% -- + -3% -4% -4% -88% kvale 2.76/s 28% 17% 8% 7% 5% 3% + -- -1% -1% -88% tye 2.78/s 29% 19% 9% 8% 6% 4% + 1% -- -0% -88% L~R 2.79/s 30% 19% 9% 8% 6% 5% + 1% 0% -- -88% BrowserUk 22.5/s 944% 857% 778% 770% 755% 743% +715% 708% 705% --
Here is the script I used to build the list:
#!/usr/bin/perl use strict; use warnings; open (LIST, '>', "list.rnd") or die "Unable to open list.rnd : $!"; for ( 1 .. 10_000 ) { my $length = int(rand 240) + 10; my $string = ""; $string .= ('a' .. 'z')[ rand 26] while length $string < $length; print LIST "$string\n"; }
And finally, here is the benchmark itself
#!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; my @list; open (LIST, '<', 'list.rnd') or die "Unable to open list.rnd for readi +ng : !"; while ( <LIST> ) { chomp; push @list, $_; } my $length = int(rand 240) + 10; my $str_a = ""; $str_a .= ('a' .. 'z')[ rand 26] while length $str_a < $length; sub expensive_function { my ($str_a, $str_b) = @_; my $foo; for ( split // , $str_a . $str_b) { $foo++; } } cmpthese -5, { 'tachyon' => sub { use Inline C =>; for my $str_b ( @list ) { next if ! same_scan( $str_a, $str_b ); expensive_function ( $str_a, $str_b ); } }, 'L~R' => sub { for my $str_b ( @list ) { next if index($str_a, substr($str_b, 0, 1)); expensive_function ( $str_a, $str_b ); } }, 'hardburn_1' => sub { my ($a_1st, $rest) = split '', $str_a, 2; for my $str_b ( @list ) { my ($b_1st, $rest) = split '', $str_b, 2; next if $b_1st ne $a_1st; expensive_function ( $str_a, $str_b ); } }, 'hardburn_2' => sub { my $rev_a = reverse $str_a; my $a_1st = chop $rev_a; for my $str_b ( @list ) { my $rev_b = reverse $str_b; my $b_1st = chop $rev_b; next if $b_1st ne $a_1st; expensive_function ( $str_a, $str_b ); } }, 'delirium' => sub { my $fc = substr($str_a,0,1); for my $str_b ( @list ) { next if $str_b !~ /^$fc/; expensive_function ( $str_a, $str_b ); } }, 'BrowserUk' => sub { my %list; @list{ map{ substr $_, 0, 1 } @list } = (); my $fc = substr $str_a, 0, 1; for my $str_b ( @list ) { next if exists $list{ $fc }; expensive_function ( $str_a, $str_b ); } }, 'tye' => sub { for my $str_b ( @list ) { next if ord( $str_a ) != ord( $str_b ); expensive_function ( $str_a, $str_b ); } }, 'kvale' => sub { my $fc = substr $str_a, 0, 1; for my $str_b ( @list ) { next if $fc ne substr $str_b, 0, 1; expensive_function ( $str_a, $str_b ); } }, 'L~R_2' => sub { my $fc = unpack "a1" , $str_a; for my $str_b ( @list ) { next if $fc ne unpack "a1", $str_b; expensive_function ( $str_a, $str_b ); } }, 'L~R_3' => sub { my ($fc_a) = $str_a =~ /^(.)/; for my $str_b ( @list ) { my ($fc_b) = $str_b =~ /^(.)/; next if $fc_a ne $fc_b; expensive_function ( $str_a, $str_b ); } }, }; __END__ __C__ int same_scan(char* str1, char* str2) { return str1[0] == str2[0] ? 1 : 0; }
Note: I have not had a chance to verify that each one of these tests is calling expensive_function the same number of times so please check back if you are interested. I need to head home now.

Cheers - L~R

Update: See updated benchmark.


In reply to Re: Matching First Character of Strings Efficiently by Limbic~Region
in thread Matching First Character of Strings Efficiently by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.