I think you are misunderstanding what the /x modifier is doing for this regular expression. From perlre:
The /x modifier itself needs a little more explanation. It tells the regular expression parser to ignore whitespace that is neither backslashed nor within a character class. You can use this to break up your regular expression into (slightly) more readable parts. The # character is also treated as a metacharacter introducing a comment, just as in ordinary Perl code. This also means that if you want real whitespace or # characters in the pattern (outside a character class, where they are unaffected by /x), that you'll either have to escape them or encode them using octal or hex escapes. Taken together, these features go a long way towards making Perl's regular expressions more readable. Note that you have to be careful not to include the pattern delimiter in the comment--perl has no way of knowing you did not intend to close the pattern early
That means that is you actually want to match a whitespace charcter in your expression, when using the /x modifier, you need to either backslash it, or use \s.
my @matches = (m/(<p\sclass=g>.*?<a\shref=https?:\/\/)([^<]*?>)(.*?(?= +<p\sclass=g>))/gx); # Should work. my @matches = (m/(<p\ class=g>.*?<a\ href=https?:\/\/)([^<]*?>)(.*?(?= +<p\ class=g>))/gx); # Should also work. my @matches = (m/(<p\s+class=g>.*?<a\s+href=https?:\/\/)([^<]*?>)(.*?( +?=<p\s+class=g>))/gx); # Probably more robust like this.
In reply to Re: Regex Extended Comments with lookahead?
by JediWizard
in thread Regex Extended Comments with lookahead?
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |