comment on

You have not used a fair test: note that my regular expression included the Metacharacters ^ and $. If I reconfigure your tests to consider this factor and swap to:

sub regex {
        my $cnt=0;
        for (@words) {
                ++$cnt if /^abcb$/i;
        }
        return $cnt;
}
[download]

I get the output:

In 4096 words, 16 are 'abcb'
        Rate uccmp regex
uccmp 1192/s    --   -5%
regex 1255/s    5%    --
In 4096 words, 0 are 'abcb'
        Rate uccmp regex
uccmp 1225/s    --   -3%
regex 1264/s    3%    --
In 4096 words, 4096 are 'abcb'
        Rate regex uccmp
regex  970/s    --  -23%
uccmp 1260/s   30%    --
[download]

which obviously compares much better. This still does not consider that the string compare require a chomp which the regular expression does not. Modifying your benchmark to consider this:

use strict;
use warnings;
use Benchmark qw(:all);

my @words;

my @letters = qw( A B C D a b c d );
build_words(\@letters, \@letters, \@letters, \@letters);
compare();

@letters = qw( d e f g h i k l );
build_words(\@letters, \@letters, \@letters, \@letters);
compare();

@letters = qw( B B B B b b b b );
build_words( [qw(A A A A a a a a)], \@letters, [qw(C C C C c c c c)], 
+\@letters);
compare();

sub build_words {
        @words=();
        my ($rA, $rB, $rC, $rD) = @_;
        for my $a (@$rA) {
                for my $b (@$rB) {
                        for my $c (@$rC) {
                                for my $d (@$rD) {
                                        push @words, "$a$b$c$d\n";
                                }
                        }
                }
        }
}

sub compare {
        my $v = @words;
        my $t = regex();
        my $u = uccmp();
        if ($t != $u) {
                die "Functions don't return the same value! regex=$t, 
+uccmp=$u\n";
        }
        print "In $v words, $t are 'abcb'\n";
        cmpthese(-5, {
                regex => sub { regex() },
                uccmp => sub { uccmp() },
        } );
}

sub regex {
        my $cnt=0;
        for (@words) {
                ++$cnt if /^abcb$/i;
        }
        return $cnt;
}

sub uccmp {
        my $cnt=0;
        for (@words) {
                chomp;
                ++$cnt if uc($_) eq 'ABCB';
        }
        return $cnt;
}
[download]

yields the results:

In 4096 words, 16 are 'abcb'
        Rate uccmp regex
uccmp  812/s    --  -32%
regex 1197/s   47%    --
In 4096 words, 0 are 'abcb'
        Rate uccmp regex
uccmp  861/s    --  -31%
regex 1255/s   46%    --
In 4096 words, 4096 are 'abcb'
       Rate uccmp regex
uccmp 856/s    --  -11%
regex 964/s   13%    --
[download]

which I think clearly favors the regular expression. In addition, if you really wanted to squeeze out performance, you could skip the split in the OP as well:

use strict;
use warnings;
use Benchmark qw(:all);

my @words;

my @letters = qw( A B C D a b c d );
build_words(\@letters, \@letters, \@letters, \@letters);
compare();

@letters = qw( d e f g h i k l );
build_words(\@letters, \@letters, \@letters, \@letters);
compare();

@letters = qw( B B B B b b b b );
build_words( [qw(A A A A a a a a)], \@letters, [qw(C C C C c c c c)], 
+\@letters);
compare();

sub build_words {
        @words=();
        my ($rA, $rB, $rC, $rD) = @_;
        for my $a (@$rA) {
                for my $b (@$rB) {
                        for my $c (@$rC) {
                                for my $d (@$rD) {
                                        push @words, "key=$a$b$c$d\n";
                                }
                        }
                }
        }
}

sub compare {
        my $v = @words;
        my $t = regex();
        my $u = uccmp();
        if ($t != $u) {
                die "Functions don't return the same value! regex=$t, 
+uccmp=$u\n";
        }
        print "In $v words, $t are 'abcb'\n";
        cmpthese(-5, {
                regex => sub { regex() },
                uccmp => sub { uccmp() },
        } );
}

sub regex {
        my $cnt=0;
        for (@words) {
                ++$cnt if /^[^=]*=abcb(?:=|$)/i;
        }
        return $cnt;
}

sub uccmp {
        my $cnt=0;
        for (@words) {
                chomp;
                ++$cnt if uc( (split /=/)[1] ) eq 'ABCB';
        }
        return $cnt;
}
[download]

which yields:

In 4096 words, 16 are 'abcb'
       Rate uccmp regex
uccmp 200/s    --  -74%
regex 767/s  283%    --
In 4096 words, 0 are 'abcb'
       Rate uccmp regex
uccmp 200/s    --  -74%
regex 756/s  278%    --
In 4096 words, 4096 are 'abcb'
       Rate uccmp regex
uccmp 206/s    --  -66%
regex 603/s  193%    --
[download]

Update: As ikegami points out, I failed to localize the arrays to the test routines, so there were a large number of no-ops. Fixing that with the code my @words = @words added where appropriate reduced margins but maintained ordering. I suspect that is just a function of the linear overhead of copying the large arrays. If this is incorrect, I would appreciate insight.

In reply to Re^3: Upper case and chomp by kennethk
in thread Upper case and chomp by TechFly

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.