in reply to Re^6: Common Perl Pitfalls
in thread Common Perl Pitfalls

Are you sure you will remember to quadruple your backslashes if you want to match a literal backslash?
Sure. But how's that relevant? If I were to write a subpattern that matches a backslash, I may write that as qr/\\/ -- but that doesn't mean that's enough reason to always use qr, even if it's intended to match something different from a backslash.
There is a huge difference between $part = qr/\\d/; and $part = q/\\d/; !
I know. Often, both are wrong.
$pat1 = '[0-9]'; $pat2 = qr/[0-9]/;
is what's usually intended.
And regarding the speed of constructed pattern ... maybe you stopped one qr// too soon. Instead of building the ultimate pattern at the point it was used, you should have built it just once at the same place you've defined the parts and then used just if ($var =~ $built_regexp) or $var =~ s/$built_regexp/replacement/;
I've no clue what you're trying to say.
I haven't seen your code.
Indeed.

Replies are listed 'Best First'.
Re^8: Common Perl Pitfalls
by Jenda (Abbot) on Apr 11, 2012 at 20:21 UTC

    Don't get me started on \d! Whoever decided to include "something that might be understood as a digit in a language/charset I've never ever heard of" in \d made a huge huge mistake. Out of ten thousands of \d, there's maybe one where this nonsense is what was meant. I do believe even now it's not too late to fix this insanity. The change would fix many times more scripts/modules than it would break.

    And what I meant regarding the speed is the difference between

    my $foo = qr/.../; my $bar = qr/..../; ... while (<>) { ... if (/$foo(?:$bar)+/) { ...
    and
    my $foo = qr/.../; my $bar = qr/..../; my $foobar = qr/$foo(?:$bar)+/; ... while (<>) { ... if (/$foobar/) { ...
    In the later case the stringification and the compilation of a longer regexp happens just once.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      Oh, sure. And I don't give a damn about the difference in compilation speed of trivial small regexes.

      But when the regexes get large, and difference of compiling the patterns is a few seconds vs a few minutes, I do care.

      But still, even in your simple example, it's three compilations + two stringifications vs a single compile.

      Here's a benchmark, 1 compilation vs 12 compilations and 20 stringifications:

      use Benchmark 'cmpthese'; cmpthese -1, { qq => 'my $p = qq{a}; $p = qq{$p$p} for 1 .. 10; qr/$p/', qr => 'my $p = qr{a}; $p = qr{$p$p} for 1 .. 10; qr/$p/', }; __END__ Rate qr qq qr 914/s -- -100% qq 283880/s 30949% --
      That's with 5.15.9 (on OSX). With 5.12.3 (same box), I get:
      Rate qr qq qr 857/s -- -100% qq 324588/s 37769% --
      And, for kicks, with 5.8.9 (again, same box):
      Rate qr qq qr 508/s -- -100% qq 301810/s 59290% --
      The resulting patterns, while identical, also differ significantly in size: the one build with repeated qr constructs is 19 times the size of the one build with qq.

      I'm usually not a stickler for speed. But I make an exception when it comes to qr.

        But if done right it's 1 compilation versus 12 compilations and 20 stringifications over the whole runtime of the script! According to the first of your benchmarks about 1 millisecond of difference. Huge deal indeed!

        If your regexes grow way too big, do whatever you must. Under normal conditions the difference in negligible, while the fact that I don't have to worry whether I'm writing a regexp or a single quoted string that will eventually happen to be part of a regexp is not. Even though or rather just because the difference is slight and changes the behaviour rarely.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.