in reply to Re: Simple regular expression problem
in thread Simple regular expression problem

It's better to avoid the ? modifier in most cases, as it's less efficient as alternatives. Here's a benchmark:
#!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; use Test::More tests => 2; our @data = qw 'foo123 abdbdr23 abcd2 abc 1234 foo!123'; our (@plain, @sticky); my @expected = ([qw 'foo 123'], [qw 'abdbdr 23'], [qw 'abcd 2'], ['abc', ''], ['', 1234], []); cmpthese -1, { plain => '@plain = map {[/^([a-z]*)(\d*)$/]} @data', sticky => '@sticky = map {[/^(\w*?)(\d*)$/]} @data', }; is_deeply \@plain, \@expected; is_deeply \@sticky, \@expected; __END__ 1..2 Rate sticky plain sticky 32582/s -- -17% plain 39385/s 21% -- ok 1 ok 2
Perl --((8:>*

Replies are listed 'Best First'.
Re^3: Simple regular expression problem
by eric256 (Parson) on Oct 03, 2005 at 14:57 UTC

    Benchmarking is fun. However you should consider your results a little more carefully before making recomendation on them. This would definitly count as a minor optimization at best since we are talking about 32k instead of 40k per second. Which means unless you are are doing 100k's of these compares you are never going to notice the difference. Also interesting is the result of that benchmark on my machine:

    1..2 Rate plain sticky plain 23682/s -- -1% sticky 23904/s 1% -- ok 1 ok 2

    Oddly the difference dropped to mere 100s per second.


    ___________
    Eric Hodges
      Well, that would be 192K vs. 240K, as the test does 6 regexes per iteration. However, if we would test against the string:
      ('a3' x 100) . '!3'
      the difference would be:
      Rate sticky plain sticky 13273/s -- -95% plain 270490/s 1938% --
      Don't dismiss benchmarks too early as "an insignificant difference".
      Perl --((8:>*

        While that looks impressive and is certianly intersting, you missed my point. Even with that huge difference you are still talking about fractions of a second unless you are dealing with several 1k records at least. If you are only doing 100 matches, or even 1,000 matches you arn't going to see the difference. Are there certain things to optimize for when you are using large strings and large datasets? Of course. Are those optimizations things that you should always keep in mind? I don't think so. In this case I beleive that a clear meaning is better, now either solution might be clearer depending on the situation and the programmer. However my point was that your benchmark shouldn't cause anyone to avoid '?' in general just because of performance. There are certainly cases where you are right (for instance 10k strings that are 200+ characters long.), but I think for most general cases the performance differnce is minor and insignificant.


        ___________
        Eric Hodges
Re^3: Simple regular expression problem
by polypompholyx (Chaplain) on Oct 03, 2005 at 14:31 UTC
    The OP didn't make it clear whether the string before the number could contain digits. However, it's certainly better to be specific in a regex: if you know (for some value of 'know') something will only contain [A-Za-z], not \w, then the former is probably preferable. On the other hand, [A-Za-z] too often it means "I cannot think of any other letters", and then your script barfs on something perfectly valid, but unexpected, like "Ångström".
Re^3: Simple regular expression problem
by jeanluca (Deacon) on Oct 03, 2005 at 14:59 UTC
    thanx for all the suggestions. I fixed it with \w+? or maybe I use the alpha example!! And now that I understand my mistake, I see that it was all the time already described in the perldoc manual!!

    All your replies are really helpful,
    Thanks a lot!!
    Luca