Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks, this is my first question. I'm using the /gx flag to do global extended regex comments, with #comments and white space. I want to fetch all matches into @match. My regex has a lookahead, and it doesn't work with the x flag. Without the lookahead, it works. Specifically, I want to get all matches into an array, and loop through them.
my @matches = ($_ =~ m/(<p class=g>.*?<a href=https?:\/\/)([^<]*?>)(.* +?(?=<p class=g>))/g); #works. #my @matches = ($_ =~ m/(<p class=g>.*?<a href=https?:\/\/)([^<]*?>)(. +*?(?=<p class=g>))/gx); #doesn't work. my @matches = ($_ =~ m/(<p class=g>.*?<a href=https?:\/\/)([^<]*?>)(.* +?<p class=g>)/g); #got rid of lookahead. works.
I know the lookahead probably isn't strictly necessary here, but regardless, what's the deal? How do I have comments and lookaheads at the same time? Thanks!

Replies are listed 'Best First'.
Re: Regex Extended Comments with lookahead?
by JediWizard (Deacon) on Nov 10, 2004 at 19:38 UTC

    I think you are misunderstanding what the /x modifier is doing for this regular expression. From perlre:

    The /x modifier itself needs a little more explanation. It tells the regular expression parser to ignore whitespace that is neither backslashed nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The # character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real whitespace or # characters in the pattern (outside a character class, where they are unaffected by /x), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making Perl's regular expressions more readable. Note that you have to be careful not to include the pattern delimiter in the comment--perl has no way of knowing you did not intend to close the pattern early

    That means that is you actually want to match a whitespace charcter in your expression, when using the /x modifier, you need to either backslash it, or use \s.

    my @matches = (m/(<p\sclass=g>.*?<a\shref=https?:\/\/)([^<]*?>)(.*?(?= +<p\sclass=g>))/gx); # Should work. my @matches = (m/(<p\ class=g>.*?<a\ href=https?:\/\/)([^<]*?>)(.*?(?= +<p\ class=g>))/gx); # Should also work. my @matches = (m/(<p\s+class=g>.*?<a\s+href=https?:\/\/)([^<]*?>)(.*?( +?=<p\s+class=g>))/gx); # Probably more robust like this.
    May the Force be with you
      Awesome!!!!

      I love perlmonks!

      :)

Re: Regex Extended Comments with lookahead?
by Eimi Metamorphoumai (Deacon) on Nov 10, 2004 at 19:43 UTC
    It has nothing to do with the lookahead, the problem is the /x is causing the spaces in your pattern to be ignored. So "foo bar" =~ /foo bar/x; won't match, but "foo bar" =~ /foo\ bar/x; will. If you backslash your spaces (or use \s+ for more flexibility) it should work fine.
Re: Regex Extended Comments with lookahead?
by Anonymous Monk on Nov 10, 2004 at 19:09 UTC
    That was me. I'm just posting to see if I get a reply notify in my email.
      Okay, me again, now I'm logged in as thpyahoo.
        do I get mail?