in reply to I match a pattern in regex, yet I don't get the group I wanted to extract for some reason

G'day SergioQ,

I feel I should start by echoing what others have said about not using a regex for parsing this type of data.

You haven't shown all data (which was probably a good move if it's huge). However, I'd suspect you have something like "<div class="soda"></div>" earlier in the the data; that would explain why $1 is a zero-length string (assuming that's what you meant by "is nothing").

I recommend that you use Regexp::Debugger to see exactly what is being matched.

As a general rule, peppering a regex with .* or .*? is a bad move: it will often produce unexpected, or at least unanticipated, results.

If you want to match all characters up to, and including, some terminal character, then match all the characters that aren't the terminal character followed by the terminal character. For example:

$ perl -E ' my $x = qq{ <div class="soda odd">\n Power. Grace. Wisdom. Won +der.\n </div>}; say $x; $x =~ m{<div class="soda[^>]+>\s*(.*?)\s*</div>}ms; say "|$1|"; ' <div class="soda odd"> Power. Grace. Wisdom. Wonder. </div> |Power. Grace. Wisdom. Wonder.|

Again, I am not advocating using a regex to parse this type of data. Furthermore, if you do have "<div class="soda"></div>" earlier in the the data, $1 will still be a zero-length string

— Ken

  • Comment on Re: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason
  • Select or Download Code