Re^2: Upper case and chomp

I prefer using uc and a string compare like the OP suggested. I did a quick benchmark, and it looks like it's faster (negligible if no strings match, roughly double the speed if they all match.

if (uc($listaccountlocked[1]) eq 'TRUE) {
[download]

Trivial benchmark:

$ cat ucmatch_vs_regex.pl
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all);

my @words;

my @letters = qw( A B C D a b c d );
build_words(\@letters, \@letters, \@letters, \@letters);
compare();

@letters = qw( d e f g h i k l );
build_words(\@letters, \@letters, \@letters, \@letters);
compare();

@letters = qw( B B B B b b b b );
build_words( [qw(A A A A a a a a)], \@letters, [qw(C C C C c c c c)], 
+\@letters);
compare();

sub build_words {
        @words=();
        my ($rA, $rB, $rC, $rD) = @_;
        for my $a (@$rA) {
                for my $b (@$rB) {
                        for my $c (@$rC) {
                                for my $d (@$rD) {
                                        push @words, "$a$b$c$d";
                                }
                        }
                }
        }
}

sub compare {
        my $v = @words;
        my $t = regex();
        my $u = uccmp();
        if ($t != $u) {
                die "Functions don't return the same value! regex=$t, 
+uccmp=$u\n";
        }
        print "In $v words, $t are 'abcb'\n";
        cmpthese(-5, {
                regex => sub { regex() },
                uccmp => sub { uccmp() },
        } );
}

sub regex {
        my $cnt=0;
        for (@words) {
                ++$cnt if /abcb/i;
        }
        return $cnt;
}

sub uccmp {
        my $cnt=0;
        for (@words) {
                ++$cnt if uc($_) eq 'ABCB';
        }
        return $cnt;
}
$ perl ucmatch_vs_regex.pl
In 4096 words, 16 are 'abcb'
        Rate regex uccmp
regex  939/s    --   -7%
uccmp 1008/s    7%    --
In 4096 words, 0 are 'abcb'
        Rate uccmp regex
uccmp 1028/s    --   -3%
regex 1055/s    3%    --
In 4096 words, 4096 are 'abcb'
       Rate regex uccmp
regex 414/s    --  -52%
uccmp 865/s  109%    --

$
[download]

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Comment on Re^2: Upper case and chomp Select or Download Code

Replies are listed 'Best First'.
Re^3: Upper case and chomp by kennethk (Abbot) on Feb 10, 2011 at 17:09 UTC
You have not used a fair test: note that my regular expression included the Metacharacters `^` and `$`. If I reconfigure your tests to consider this factor and swap to: `sub regex { my $cnt=0; for (@words) { ++$cnt if /^abcb$/i; } return $cnt; }` [download] I get the output: `In 4096 words, 16 are 'abcb' Rate uccmp regex uccmp 1192/s -- -5% regex 1255/s 5% -- In 4096 words, 0 are 'abcb' Rate uccmp regex uccmp 1225/s -- -3% regex 1264/s 3% -- In 4096 words, 4096 are 'abcb' Rate regex uccmp regex 970/s -- -23% uccmp 1260/s 30% --` [download] which obviously compares much better. This still does not consider that the string compare require a chomp which the regular expression does not. Modifying your benchmark to consider this: Read more... (2 kB) yields the results: `In 4096 words, 16 are 'abcb' Rate uccmp regex uccmp 812/s -- -32% regex 1197/s 47% -- In 4096 words, 0 are 'abcb' Rate uccmp regex uccmp 861/s -- -31% regex 1255/s 46% -- In 4096 words, 4096 are 'abcb' Rate uccmp regex uccmp 856/s -- -11% regex 964/s 13% --` [download] which I think clearly favors the regular expression. In addition, if you really wanted to squeeze out performance, you could skip the split in the OP as well: Read more... (2 kB) which yields: `In 4096 words, 16 are 'abcb' Rate uccmp regex uccmp 200/s -- -74% regex 767/s 283% -- In 4096 words, 0 are 'abcb' Rate uccmp regex uccmp 200/s -- -74% regex 756/s 278% -- In 4096 words, 4096 are 'abcb' Rate uccmp regex uccmp 206/s -- -66% regex 603/s 193% --` [download] Update: As ikegami points out, I failed to localize the arrays to the test routines, so there were a large number of no-ops. Fixing that with the code `my @words = @words` added where appropriate reduced margins but maintained ordering. I suspect that is just a function of the linear overhead of copying the large arrays. If this is incorrect, I would appreciate insight.	[reply] [d/l] [select]
Re^4: Upper case and chomp by ikegami (Patriarch) on Feb 10, 2011 at 19:12 UTC
Still bad. After the first pass, there is nothing left to chomp. You need to add `my @words = @words;` to the start of both functions.	[reply] [d/l]
Re^4: Upper case and chomp by roboticus (Chancellor) on Feb 10, 2011 at 22:57 UTC
kennethk: I'm sorry I misrepresented your test. I didn't even think about the effect of adding the anchors. Oh, well. Thankfully, you were able to get things fixed up with ikegami, so we now have a better comparison. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]
Re^3: Upper case and chomp by hbm (Hermit) on Feb 10, 2011 at 16:12 UTC
Another option is string escapes: `if ("\U$listaccountlocked[1]" eq 'TRUE') {` [download] I added another sub to your benchmark: `sub slashU { my $cnt=0; for (@words){ ++$cnt if "\U$_" eq 'ABCB'; } return $cnt; }` [download] But it didn't fare very well: `In 4096 words, 16 are 'abcb' Rate slashU uccmp regex slashU 321/s -- -18% -26% uccmp 390/s 21% -- -10% regex 435/s 36% 12% -- In 4096 words, 0 are 'abcb' Rate slashU uccmp regex slashU 328/s -- -17% -25% uccmp 397/s 21% -- -10% regex 439/s 34% 11% -- In 4096 words, 4096 are 'abcb' Rate regex slashU uccmp regex 266/s -- -6% -18% slashU 282/s 6% -- -13% uccmp 325/s 22% 15% --` [download] Ah well.	[reply] [d/l] [select]
Re^4: Upper case and chomp by ikegami (Patriarch) on Feb 10, 2011 at 19:02 UTC
`if ("\U$listaccountlocked[1]" eq 'TRUE') {` [download] is just an obfuscated way of writing `if (uc($listaccountlocked[1]) eq 'TRUE') {` [download] From a performance point of view, the former is a proper subset of the latter. Not only does \U calls uc(), it creates an extra copy of the string. $ perl -MO=Concise,-exec -e'my $y = "\U$x";' 1 <0> enter 2 <;> nextstate(main 1 -e:1) v:{ 3 <$> gvsv(x) s 4 <1> uc[t2] sK/1 5 <@> stringify[t3] sK/1 <--- This addition is the only 6 <0> padsv[$y:1,2] sRM/LVINTRO difference. It creates a 7 <2> sassign vKS/2 copy of the string. 8 <@> leave[1 ref] vKP/REFC -e syntax OK $ perl -MO=Concise,-exec -e'my $y = uc($x);' 1 <0> enter 2 <;> nextstate(main 1 -e:1) v:{ 3 <$> gvsv(x) s 4 <1> uc[t2] sK/1 5 <0> padsv[$y:1,2] sRM/LVINTRO 6 <2> sassign vKS/2 7 <@> leave[1 ref] vKP/REFC -e syntax OK [download]	[reply] [d/l] [select]