Re^3: Upper case and chomp

You have not used a fair test: note that my regular expression included the Metacharacters ^ and $. If I reconfigure your tests to consider this factor and swap to:

sub regex {
        my $cnt=0;
        for (@words) {
                ++$cnt if /^abcb$/i;
        }
        return $cnt;
}
[download]

I get the output:

In 4096 words, 16 are 'abcb'
        Rate uccmp regex
uccmp 1192/s    --   -5%
regex 1255/s    5%    --
In 4096 words, 0 are 'abcb'
        Rate uccmp regex
uccmp 1225/s    --   -3%
regex 1264/s    3%    --
In 4096 words, 4096 are 'abcb'
        Rate regex uccmp
regex  970/s    --  -23%
uccmp 1260/s   30%    --
[download]

which obviously compares much better. This still does not consider that the string compare require a chomp which the regular expression does not. Modifying your benchmark to consider this:

use strict;
use warnings;
use Benchmark qw(:all);

my @words;

my @letters = qw( A B C D a b c d );
build_words(\@letters, \@letters, \@letters, \@letters);
compare();

@letters = qw( d e f g h i k l );
build_words(\@letters, \@letters, \@letters, \@letters);
compare();

@letters = qw( B B B B b b b b );
build_words( [qw(A A A A a a a a)], \@letters, [qw(C C C C c c c c)], 
+\@letters);
compare();

sub build_words {
        @words=();
        my ($rA, $rB, $rC, $rD) = @_;
        for my $a (@$rA) {
                for my $b (@$rB) {
                        for my $c (@$rC) {
                                for my $d (@$rD) {
                                        push @words, "$a$b$c$d\n";
                                }
                        }
                }
        }
}

sub compare {
        my $v = @words;
        my $t = regex();
        my $u = uccmp();
        if ($t != $u) {
                die "Functions don't return the same value! regex=$t, 
+uccmp=$u\n";
        }
        print "In $v words, $t are 'abcb'\n";
        cmpthese(-5, {
                regex => sub { regex() },
                uccmp => sub { uccmp() },
        } );
}

sub regex {
        my $cnt=0;
        for (@words) {
                ++$cnt if /^abcb$/i;
        }
        return $cnt;
}

sub uccmp {
        my $cnt=0;
        for (@words) {
                chomp;
                ++$cnt if uc($_) eq 'ABCB';
        }
        return $cnt;
}
[download]

yields the results:

In 4096 words, 16 are 'abcb'
        Rate uccmp regex
uccmp  812/s    --  -32%
regex 1197/s   47%    --
In 4096 words, 0 are 'abcb'
        Rate uccmp regex
uccmp  861/s    --  -31%
regex 1255/s   46%    --
In 4096 words, 4096 are 'abcb'
       Rate uccmp regex
uccmp 856/s    --  -11%
regex 964/s   13%    --
[download]

which I think clearly favors the regular expression. In addition, if you really wanted to squeeze out performance, you could skip the split in the OP as well:

use strict;
use warnings;
use Benchmark qw(:all);

my @words;

my @letters = qw( A B C D a b c d );
build_words(\@letters, \@letters, \@letters, \@letters);
compare();

@letters = qw( d e f g h i k l );
build_words(\@letters, \@letters, \@letters, \@letters);
compare();

@letters = qw( B B B B b b b b );
build_words( [qw(A A A A a a a a)], \@letters, [qw(C C C C c c c c)], 
+\@letters);
compare();

sub build_words {
        @words=();
        my ($rA, $rB, $rC, $rD) = @_;
        for my $a (@$rA) {
                for my $b (@$rB) {
                        for my $c (@$rC) {
                                for my $d (@$rD) {
                                        push @words, "key=$a$b$c$d\n";
                                }
                        }
                }
        }
}

sub compare {
        my $v = @words;
        my $t = regex();
        my $u = uccmp();
        if ($t != $u) {
                die "Functions don't return the same value! regex=$t, 
+uccmp=$u\n";
        }
        print "In $v words, $t are 'abcb'\n";
        cmpthese(-5, {
                regex => sub { regex() },
                uccmp => sub { uccmp() },
        } );
}

sub regex {
        my $cnt=0;
        for (@words) {
                ++$cnt if /^[^=]*=abcb(?:=|$)/i;
        }
        return $cnt;
}

sub uccmp {
        my $cnt=0;
        for (@words) {
                chomp;
                ++$cnt if uc( (split /=/)[1] ) eq 'ABCB';
        }
        return $cnt;
}
[download]

which yields:

In 4096 words, 16 are 'abcb'
       Rate uccmp regex
uccmp 200/s    --  -74%
regex 767/s  283%    --
In 4096 words, 0 are 'abcb'
       Rate uccmp regex
uccmp 200/s    --  -74%
regex 756/s  278%    --
In 4096 words, 4096 are 'abcb'
       Rate uccmp regex
uccmp 206/s    --  -66%
regex 603/s  193%    --
[download]

Update: As ikegami points out, I failed to localize the arrays to the test routines, so there were a large number of no-ops. Fixing that with the code my @words = @words added where appropriate reduced margins but maintained ordering. I suspect that is just a function of the linear overhead of copying the large arrays. If this is incorrect, I would appreciate insight.

Comment on Re^3: Upper case and chomp Select or Download Code

Replies are listed 'Best First'.
Re^4: Upper case and chomp by ikegami (Patriarch) on Feb 10, 2011 at 19:12 UTC
Still bad. After the first pass, there is nothing left to chomp. You need to add `my @words = @words;` to the start of both functions.	[reply] [d/l]
Re^4: Upper case and chomp by roboticus (Chancellor) on Feb 10, 2011 at 22:57 UTC
kennethk: I'm sorry I misrepresented your test. I didn't even think about the effect of adding the anchors. Oh, well. Thankfully, you were able to get things fixed up with ikegami, so we now have a better comparison. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]