in reply to Re^8: Parser Performance Question (updated)
in thread Parser Performance Question

Atomic grouping (?>...) seems to fix the problem
use strict; use warnings; use feature 'say'; use Data::Dump qw( pp ); my @strs = qw( "..\\".. "abc" "a\"bc" "a\\\\bc" "a\" ); my %re = ( LanX => qr/ " (?> \\\\ | \\" | [^"] )* " /x, Eily => qr/ " (?: [^"\\] | \\. )* " /x ); for my $str (@strs) { say "\nTesting: <$str> = ", pp ($str); $str =~ /$re{$_}/ and say "$_ found $&" or say "$_ found nothing" for keys %re; }

Testing: <"..\"..> = "\"..\\\".." LanX found nothing Eily found nothing Testing: <"abc"> = "\"abc\"" LanX found "abc" Eily found "abc" Testing: <"a\"bc"> = "\"a\\\"bc\"" LanX found "a\"bc" Eily found "a\"bc" Testing: <"a\\bc"> = "\"a\\\\bc\"" LanX found "a\\bc" Eily found "a\\bc" Testing: <"a\"> = "\"a\\\"" LanX found nothing Eily found nothing

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

Replies are listed 'Best First'.
Re^10: Parser Performance Question (Atomic grouping)
by Eily (Monsignor) on Oct 06, 2017 at 12:37 UTC

    I had completly forgotten about the atomic match :). I reread the documentation (because I wasn't sure what it does exactly), and it shows that it is equivalent to the possessive quantifiers. So in the spirit of TIMTOWTDI:

    use strict; use warnings; use feature 'say'; use Data::Dump qw( pp ); my @strs = qw( "..\\".. "abc" "a\"bc" "a\\\\bc" "a\" ); my %re = ( LanX => qr/ " (?> \\\\ | \\" | [^"] )* " /x, Eily => qr/ " (?: [^"\\] | \\. )* " /x, Poss => qr/ " (?: \\\\ | \\" | [^"] )*+ " /x, ); for my $str (@strs) { say "\nTesting: <$str> = ", pp ($str); $str =~ /$re{$_}/ and say "$_ found $&" or say "$_ found nothing" for keys %re; }

      Great, discussion we (re)learned a lot! :)

      edit

      I saw atomic grouping discussed in Friedl's Book, but this use example is very instructive.

      And inhibiting backtracking has a great performance benefit, think I have to revisist some older projects of mine again.

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

        Ah yes, I have a great potential for relearning :P

      I appreciate reading the conversation you guys had, sorry I wasn't to able to take part. I'm now using a slightly modified version of Eily's regex (proven using the above framework and in my own tests):

      our $RXdqs = qr/ " (?> \\. | [^"\\] )* " /x;

      Note that all of my $RX... regex variables are used inside other regexes and surrounded on both sides by \s* and various specific characters like parentheses and commas (this is a parser for a formally defined syntax that proceeds through the input text serially, I'm not trying to find $needle inside some giant $haystack). I do have individual tests for these variables now, previously I was only testing the parser at a higher level.

      - Andrew