You have not used a fair test: note that my regular expression included the Metacharacters ^ and $. If I reconfigure your tests to consider this factor and swap to:
sub regex { my $cnt=0; for (@words) { ++$cnt if /^abcb$/i; } return $cnt; }
I get the output:

In 4096 words, 16 are 'abcb' Rate uccmp regex uccmp 1192/s -- -5% regex 1255/s 5% -- In 4096 words, 0 are 'abcb' Rate uccmp regex uccmp 1225/s -- -3% regex 1264/s 3% -- In 4096 words, 4096 are 'abcb' Rate regex uccmp regex 970/s -- -23% uccmp 1260/s 30% --

which obviously compares much better. This still does not consider that the string compare require a chomp which the regular expression does not. Modifying your benchmark to consider this:

use strict; use warnings; use Benchmark qw(:all); my @words; my @letters = qw( A B C D a b c d ); build_words(\@letters, \@letters, \@letters, \@letters); compare(); @letters = qw( d e f g h i k l ); build_words(\@letters, \@letters, \@letters, \@letters); compare(); @letters = qw( B B B B b b b b ); build_words( [qw(A A A A a a a a)], \@letters, [qw(C C C C c c c c)], +\@letters); compare(); sub build_words { @words=(); my ($rA, $rB, $rC, $rD) = @_; for my $a (@$rA) { for my $b (@$rB) { for my $c (@$rC) { for my $d (@$rD) { push @words, "$a$b$c$d\n"; } } } } } sub compare { my $v = @words; my $t = regex(); my $u = uccmp(); if ($t != $u) { die "Functions don't return the same value! regex=$t, +uccmp=$u\n"; } print "In $v words, $t are 'abcb'\n"; cmpthese(-5, { regex => sub { regex() }, uccmp => sub { uccmp() }, } ); } sub regex { my $cnt=0; for (@words) { ++$cnt if /^abcb$/i; } return $cnt; } sub uccmp { my $cnt=0; for (@words) { chomp; ++$cnt if uc($_) eq 'ABCB'; } return $cnt; }
yields the results:
In 4096 words, 16 are 'abcb' Rate uccmp regex uccmp 812/s -- -32% regex 1197/s 47% -- In 4096 words, 0 are 'abcb' Rate uccmp regex uccmp 861/s -- -31% regex 1255/s 46% -- In 4096 words, 4096 are 'abcb' Rate uccmp regex uccmp 856/s -- -11% regex 964/s 13% --
which I think clearly favors the regular expression. In addition, if you really wanted to squeeze out performance, you could skip the split in the OP as well:
use strict; use warnings; use Benchmark qw(:all); my @words; my @letters = qw( A B C D a b c d ); build_words(\@letters, \@letters, \@letters, \@letters); compare(); @letters = qw( d e f g h i k l ); build_words(\@letters, \@letters, \@letters, \@letters); compare(); @letters = qw( B B B B b b b b ); build_words( [qw(A A A A a a a a)], \@letters, [qw(C C C C c c c c)], +\@letters); compare(); sub build_words { @words=(); my ($rA, $rB, $rC, $rD) = @_; for my $a (@$rA) { for my $b (@$rB) { for my $c (@$rC) { for my $d (@$rD) { push @words, "key=$a$b$c$d\n"; } } } } } sub compare { my $v = @words; my $t = regex(); my $u = uccmp(); if ($t != $u) { die "Functions don't return the same value! regex=$t, +uccmp=$u\n"; } print "In $v words, $t are 'abcb'\n"; cmpthese(-5, { regex => sub { regex() }, uccmp => sub { uccmp() }, } ); } sub regex { my $cnt=0; for (@words) { ++$cnt if /^[^=]*=abcb(?:=|$)/i; } return $cnt; } sub uccmp { my $cnt=0; for (@words) { chomp; ++$cnt if uc( (split /=/)[1] ) eq 'ABCB'; } return $cnt; }
which yields:
In 4096 words, 16 are 'abcb' Rate uccmp regex uccmp 200/s -- -74% regex 767/s 283% -- In 4096 words, 0 are 'abcb' Rate uccmp regex uccmp 200/s -- -74% regex 756/s 278% -- In 4096 words, 4096 are 'abcb' Rate uccmp regex uccmp 206/s -- -66% regex 603/s 193% --

Update: As ikegami points out, I failed to localize the arrays to the test routines, so there were a large number of no-ops. Fixing that with the code my @words = @words added where appropriate reduced margins but maintained ordering. I suspect that is just a function of the linear overhead of copying the large arrays. If this is incorrect, I would appreciate insight.


In reply to Re^3: Upper case and chomp by kennethk
in thread Upper case and chomp by TechFly

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.