in reply to Re: Warning about playing with matches
in thread Warning about playing with matches

Agreed. I would try to keep the optional spaces as outside as possible. And I am guessing that I would not want to use the sorted version I have listed above for performance reasons. I would probably want to have them more in the order that they showed in the section listings above since you are more likely to have a simple case than a case four levels deep. Now if only I could get 5.10.0...
  • Comment on Re^2: Warning about playing with matches

Replies are listed 'Best First'.
Re^3: Warning about playing with matches
by dave_the_m (Monsignor) on Oct 13, 2015 at 21:25 UTC
    And I am guessing that I would not want to use the sorted version I have listed above for performance reasons
    With a trie it doesn't matter. Any (list|of|fixed|words), no matter how long or whether sorted or not will be pre-compiled into a trie. The only thing you should do is deduplicate them. To give you some idea of the performance difference, taking the pattern at the end of your posting and doing a repeated failed match against a long string:
    my $r = qr/(?:u|uau|uauau| .... ufududufubudufufudufubudufu)/x; my $s = "a" x 1000000; $s =~ $r for 1..10;

    On my laptop, this takes 27s on 5.8.9 and 0.024s on 5.10.0 and later.

    perl 5.10.0 was released about 8 years ago. If you're going to do lots of matching against big word lists it would pay to upgrade to something newer than 5.8.x.

    Dave.

        I remember an upper limit for trie optimization, are you saying it was removed?
        Interesting, I did not know that. It appears that when the initial regex opnode list is constructed, if it has more than 65535 nodes (so the BRANCH nodes have to use LONGJMP nodes to continue) then the trie optimisation doesn't kick in. This is indeed still the case.

        Dave.