in reply to Re^7: Common Perl Pitfalls
in thread Common Perl Pitfalls

Don't get me started on \d! Whoever decided to include "something that might be understood as a digit in a language/charset I've never ever heard of" in \d made a huge huge mistake. Out of ten thousands of \d, there's maybe one where this nonsense is what was meant. I do believe even now it's not too late to fix this insanity. The change would fix many times more scripts/modules than it would break.

And what I meant regarding the speed is the difference between

my $foo = qr/.../; my $bar = qr/..../; ... while (<>) { ... if (/$foo(?:$bar)+/) { ...
and
my $foo = qr/.../; my $bar = qr/..../; my $foobar = qr/$foo(?:$bar)+/; ... while (<>) { ... if (/$foobar/) { ...
In the later case the stringification and the compilation of a longer regexp happens just once.

Jenda
Enoch was right!
Enjoy the last years of Rome.

Replies are listed 'Best First'.
Re^9: Common Perl Pitfalls
by JavaFan (Canon) on Apr 11, 2012 at 20:57 UTC
    Oh, sure. And I don't give a damn about the difference in compilation speed of trivial small regexes.

    But when the regexes get large, and difference of compiling the patterns is a few seconds vs a few minutes, I do care.

    But still, even in your simple example, it's three compilations + two stringifications vs a single compile.

    Here's a benchmark, 1 compilation vs 12 compilations and 20 stringifications:

    use Benchmark 'cmpthese'; cmpthese -1, { qq => 'my $p = qq{a}; $p = qq{$p$p} for 1 .. 10; qr/$p/', qr => 'my $p = qr{a}; $p = qr{$p$p} for 1 .. 10; qr/$p/', }; __END__ Rate qr qq qr 914/s -- -100% qq 283880/s 30949% --
    That's with 5.15.9 (on OSX). With 5.12.3 (same box), I get:
    Rate qr qq qr 857/s -- -100% qq 324588/s 37769% --
    And, for kicks, with 5.8.9 (again, same box):
    Rate qr qq qr 508/s -- -100% qq 301810/s 59290% --
    The resulting patterns, while identical, also differ significantly in size: the one build with repeated qr constructs is 19 times the size of the one build with qq.

    I'm usually not a stickler for speed. But I make an exception when it comes to qr.

      But if done right it's 1 compilation versus 12 compilations and 20 stringifications over the whole runtime of the script! According to the first of your benchmarks about 1 millisecond of difference. Huge deal indeed!

      If your regexes grow way too big, do whatever you must. Under normal conditions the difference in negligible, while the fact that I don't have to worry whether I'm writing a regexp or a single quoted string that will eventually happen to be part of a regexp is not. Even though or rather just because the difference is slight and changes the behaviour rarely.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

        For me, it's the other way around. For the majority of the (sub)patterns (even with most uses of backslashes), it doesn't matter whether you write q{PAT} or qr{PAT} (it's the same keystrokes inside the braces). Meaning, there's absolutely nada difference in readability.

        Why go for the expensive solution? If your pattern grows, at what moment do you revisit your program, and chop off the r in qr?

        It's not that I never use qr. Sometimes, there's a (sub)pattern that's more readable as qr than as q. And sometimes, one does want a first class regexp construct. But those are the exceptions.

        Do note that using q building blocks to build your patterns gives you more flexibility than limiting yourself to just qr:

        my $vowels = 'aeiou'; my $odds = '13579'; my $odd_or_vowel = '[$vowels$odds]';
        To write that as qr, you'd have to write something like:
        my $vowels = qr/[aeiou]/; my $odds = qr/[13579]/; my $odd_or_vowel = qr/$vowels|$odds/;
        which, while matching the same language, throws off the optimizer, and makes not only for a slower compilation, the match itself is slower.