in reply to How do I avoid regex engine bumping along inside an atomic pattern?

If there was no x in the comment the second one would still match at the final x. Could it be that the pattern you want is something like $str =~ m/(^|$c) x/? Because even if there wasn't an x in the comment in the latter example, your pattern would still match the final x. Because it is an x preceeded by 0 comments.

In general you have a more basic problem. Which is that you're trying to use regular expressions for parsing, which they are poorly suited for. Instead you want to use regular expressions for tokenizing, and then move parsing logic into code. The basic trick for that is to use pos and the \G assertion liberally within regular expressions using the /g modifier.

  • Comment on Re: How do I avoid regex engine bumping along inside an atomic pattern?
  • Download Code

Replies are listed 'Best First'.
Re^2: How do I avoid regex engine bumping along inside an atomic pattern?
by zemane (Novice) on Aug 24, 2008 at 16:55 UTC
    Hi, If there was no x in the comment, the second test would still fail because there is no ' x' to match (note the blank space before the x). But I agree with you that I am trying to do too much with regular expression. I believe I can do the following:
    my $c = qr/(?>\s|--[^\n]*(?:\n|\z))/; # one whitespace or one comm +ent # later on, when parsing... pos($str) = 0; if ($str =~ m/a/gc) { print "found a\n" } else { print "missing a\n" } $str =~ m/$c*/gc; # skip any comments, whitespaces if ($str =~ m/x/gc) { print "found x\n" } else { print "missing x\n" }
    I am not sure if I need set pos($str) to 0 at the beginning. And I am not sure if I need to use \G when parsing.

    But again, thanks for your ideas!

      You don't need to set pos($str) to 0 at the beginning - it is automatically undef which does the same thing. However you do need to reset it after every failed match before you try to match again.

      But you do need to use \G or else you get your original problem. Using a \G at the start of your RE says, "Does this match right where I left off?" Leaving it out means, "Search from where I left off to find where it matches." So the latter will search ahead and find matches inside comments. The former can have the logic to know whether it is inside a comment or not. The latter does not.

      About the second test, I suspect you didn't say exactly what you meant to say in the original question...