in reply to Warning about playing with matches

As long as all the alternates are fixed strings, perl 5.10.0 onwards will compile all alternates into a single data structure called a trie, which is a kind of tree that allows for all alternatives to be scanned for in a single pass. So it's very efficient. The trick is to keep any meta-stuff outside, e.g.
/\s+(abc|def|ghij|....)/ # good, will be optimised /(\s+abc|\s+def|\s+ghij|....)/ # bad

Dave.

Replies are listed 'Best First'.
Re^2: Warning about playing with matches
by ExReg (Priest) on Oct 13, 2015 at 19:55 UTC
    Agreed. I would try to keep the optional spaces as outside as possible. And I am guessing that I would not want to use the sorted version I have listed above for performance reasons. I would probably want to have them more in the order that they showed in the section listings above since you are more likely to have a simple case than a case four levels deep. Now if only I could get 5.10.0...
      And I am guessing that I would not want to use the sorted version I have listed above for performance reasons
      With a trie it doesn't matter. Any (list|of|fixed|words), no matter how long or whether sorted or not will be pre-compiled into a trie. The only thing you should do is deduplicate them. To give you some idea of the performance difference, taking the pattern at the end of your posting and doing a repeated failed match against a long string:
      my $r = qr/(?:u|uau|uauau| .... ufududufubudufufudufubudufu)/x; my $s = "a" x 1000000; $s =~ $r for 1..10;

      On my laptop, this takes 27s on 5.8.9 and 0.024s on 5.10.0 and later.

      perl 5.10.0 was released about 8 years ago. If you're going to do lots of matching against big word lists it would pay to upgrade to something newer than 5.8.x.

      Dave.