in reply to Re: Re: First Pattern Matching
in thread First Pattern Matching

As you can read here I like to qr// my patterns. In my case I gained nothing from it (other than not biting myself), but you'd gain a great deal by qr//:ing the @patterns. And while we're talking about qr//, I'd prefer to see a qr// object instead of an o modifier in the if expression.

The o modifier is evil and as of qr//'s introduction we no longer need it. But we'd be fine even without both qr// and the o modifier in this particular case. That is because perl is friendly enough to remember the last pattern used for every match op. If the patterns are identical (stringwise) the last compiled regex for that match op is used. Since you only use $pat once no recompilation is done.

Demonstration:
my @patterns1 = ('foo') x 2; my @patterns2 = ('bar') x 2; use re 'debug'; while (@patterns1) { 'a' =~ shift @patterns1; 'b' =~ shift @patterns2; } __END__ Compiling REx `foo' size 3 first at 1 1: EXACT <foo>(3) 3: END(0) anchored `foo' at 0 (checking anchored isall) minlen 3 Compiling REx `bar' size 3 first at 1 1: EXACT <bar>(3) 3: END(0) anchored `bar' at 0 (checking anchored isall) minlen 3 Freeing REx: `foo' Freeing REx: `bar'
As you can see, the foo pattern and the bar pattern are only compiled once. But there's no harm in using qr// here, so I still suggest it.

Cheers,
-Anomo

Replies are listed 'Best First'.
Re: Re: Re: Re: First Pattern Matching
by jryan (Vicar) on Jul 12, 2002 at 01:19 UTC

    I'm glad you brought up this point. I too am a huge fan of qr; however, I think this situation is a perfect use of the /o operator. I assumed that the snippet the author posted was but a morsal of his actual code; he probably uses dozens of patterns and thousands of lines of input. Compiling the regex with /o (rather than building it with qr) is ideal for this situation where a single regex is to be applied to huge amounts of data. It will result in a speed boost. For instance, I modified my earlier code and ran this benchmark:

    use Benchmark; timethese(1000, { Slasho => \&withslasho, None => \&without, qr => \&withqr }); sub withslasho { my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my @array = ($str1, $str2) x 500; my @patterns = ('B.B', 'CB')x10; my $pat = join '|',@patterns; foreach my $string (@array) { if($string =~ /($pat)/o) { # do a pattern lookup to see which pattern matched. my $matched; foreach my $p (@patterns) { if ($1 =~ /$p/) { $matched = $p; last; } } } } } sub without { my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my @array = ($str1, $str2) x 500; my @patterns = ('B.B', 'CB')x10; my $pat = join '|',@patterns; foreach my $string (@array) { if($string =~ /($pat)/) { # do a pattern lookup to see which pattern matched. my $matched; foreach my $p (@patterns) { if ($1 =~ /$p/) { $matched = $p; last; } } } } } sub withqr { my $str1 = 'ABCBXBCA'; my $str2 = 'APCBXBCAC'; my @array = ($str1, $str2) x 500; my @patterns = ('B.B', 'CB')x10; my $pat = join '|',@patterns; $pat = qr/$pat/; foreach my $string (@array) { if($string =~ /($pat)/) { # do a pattern lookup to see which pattern matched. my $matched; foreach my $p (@patterns) { if ($1 =~ /$p/) { $matched = $p; last; } } } } }

    Which outputs:

    Benchmark: timing 1000 iterations of None, Slasho, qr... None: 70 wallclock secs (69.60 usr + 0.00 sys = 69.60 CPU) @ 14 +.37/s (n=1000) Slasho: 61 wallclock secs (61.24 usr + 0.00 sys = 61.24 CPU) @ 16 +.33/s (n=1000) qr: 66 wallclock secs (65.80 usr + 0.00 sys = 65.80 CPU) @ 15 +.20/s (n=1000)
      I too am a huge fan of qr;

      Ehrrmm, I see that you hardly need a lesson in how to use qr. :)

      however, I think this situation is a perfect use of the /o operator.

      I don't, especially since it doesn't work on ActivePerl. :) But there's also another reason why I don't like to use it. As I've already said here perl optimizes away recompilation in simple cases like this. As your benchmark shows there's not much difference between with and without the o modifier. So in this case I'd prefer not using it. And the reason for that is that I'm scared of myself. One day I might put the code in a subroutine. Another day I might change it to take arguments that I use in the regex. If I forget about the o -- and I most probably will -- it might take me a while to discover the bug. If I use qr for this, or even nothing fancy at all, I won't get bitten.

      I code with a lot of personal style and hints, and for me the o says "this is a dynamically set constant, and is supposed to be that way, so don't bother. It should never change. NEVER!". Often I don't mean that, and hence I use qr instead. I'm scared of my own benchmark results though: (your code)
      Benchmark: timing 1000 iterations of None, Slasho, qr... None: 30 wallclock secs (30.08 usr + 0.14 sys = 30.22 CPU) @ 33.09/s Slasho: 29 wallclock secs (28.74 usr + 0.02 sys = 28.76 CPU) @ 34.77/s qr: 31 wallclock secs (30.49 usr + 0.00 sys = 30.49 CPU) @ 32.80/s
      For me qr is even worse. This does indeed surprise me.

      Cheers,
      -Anomo