Re^2: Simple regular expression problem

It's better to avoid the ? modifier in most cases, as it's less efficient as alternatives. Here's a benchmark:

#!/usr/bin/perl

use strict;
use warnings;

use Benchmark 'cmpthese';
use Test::More tests => 2;

our @data = qw 'foo123 abdbdr23 abcd2 abc 1234 foo!123';
our (@plain, @sticky);

my @expected = ([qw 'foo 123'], [qw 'abdbdr 23'], [qw 'abcd 2'],
                ['abc', ''], ['', 1234], []);

cmpthese -1, {
    plain   => '@plain  = map {[/^([a-z]*)(\d*)$/]} @data',
    sticky  => '@sticky = map {[/^(\w*?)(\d*)$/]} @data',
};

is_deeply \@plain, \@expected;
is_deeply \@sticky, \@expected;

__END__
1..2
          Rate sticky  plain
sticky 32582/s     --   -17%
plain  39385/s    21%     --
ok 1
ok 2
[download]

Perl --((8:>*

Comment on Re^2: Simple regular expression problem Download Code

Replies are listed 'Best First'.
Re^3: Simple regular expression problem by eric256 (Parson) on Oct 03, 2005 at 14:57 UTC
Benchmarking is fun. However you should consider your results a little more carefully before making recomendation on them. This would definitly count as a minor optimization at best since we are talking about 32k instead of 40k per second. Which means unless you are are doing 100k's of these compares you are never going to notice the difference. Also interesting is the result of that benchmark on my machine: `1..2 Rate plain sticky plain 23682/s -- -1% sticky 23904/s 1% -- ok 1 ok 2` [download] Oddly the difference dropped to mere 100s per second. ___________ Eric Hodges	[reply] [d/l]
Re^4: Simple regular expression problem by Perl Mouse (Chaplain) on Oct 03, 2005 at 15:30 UTC
Well, that would be 192K vs. 240K, as the test does 6 regexes per iteration. However, if we would test against the string: `('a3' x 100) . '!3'` [download] the difference would be: `Rate sticky plain sticky 13273/s -- -95% plain 270490/s 1938% --` [download] Don't dismiss benchmarks too early as "an insignificant difference". `Perl --((8:>*`	[reply] [d/l] [select]
Re^5: Simple regular expression problem by eric256 (Parson) on Oct 03, 2005 at 16:31 UTC
While that looks impressive and is certianly intersting, you missed my point. Even with that huge difference you are still talking about fractions of a second unless you are dealing with several 1k records at least. If you are only doing 100 matches, or even 1,000 matches you arn't going to see the difference. Are there certain things to optimize for when you are using large strings and large datasets? Of course. Are those optimizations things that you should always keep in mind? I don't think so. In this case I beleive that a clear meaning is better, now either solution might be clearer depending on the situation and the programmer. However my point was that your benchmark shouldn't cause anyone to avoid '?' in general just because of performance. There are certainly cases where you are right (for instance 10k strings that are 200+ characters long.), but I think for most general cases the performance differnce is minor and insignificant. ___________ Eric Hodges	[reply]
Re^3: Simple regular expression problem by polypompholyx (Chaplain) on Oct 03, 2005 at 14:31 UTC
The OP didn't make it clear whether the string before the number could contain digits. However, it's certainly better to be specific in a regex: if you know (for some value of 'know') something will only contain `[A-Za-z]`, not `\w`, then the former is probably preferable. On the other hand, `[A-Za-z]` too often it means "I cannot think of any other letters", and then your script barfs on something perfectly valid, but unexpected, like "Ångström".	[reply] [d/l] [select]
Re^3: Simple regular expression problem by jeanluca (Deacon) on Oct 03, 2005 at 14:59 UTC
thanx for all the suggestions. I fixed it with \w+? or maybe I use the alpha example!! And now that I understand my mistake, I see that it was all the time already described in the perldoc manual!! All your replies are really helpful, Thanks a lot!! Luca	[reply]