in reply to I match a pattern in regex, yet I don't get the group I wanted to extract for some reason

This would probably be a good application for HTML::Parser or any of the DOM-building modules that I am sure other monks will hasten to recommend.

However, your problem is probably that the "stretchy" groups in your pattern are not matching as you intend. I suggest (untested) m!<div class="soda[^"]*">(.*?­)</div>! instead. The important difference is that this alternative constrains the initial "discard" match to not include double quotes, and therefore not to run past the opening div tag. Also note the use of ! as delimiter to avoid "leaning toothpick syndrome" in this version.

If you are trying to catch multiple items from a single large input block, I suggest (also untested):

while (m!<div class="soda[^"]*">(.*?­)</div>!g) { say "matched!"; my $grp = $1; say $grp; }

If the text you want does not contain additional HTML, you could also replace (.*?) with ([^<]*). Generally, more constrained search patterns like these will also perform better because they will need backtracking less often.

If the text you want can contain additional HTML, use HTML::Parser; it will work far better.

  • Comment on Re: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason
by Anonymous Monk on Jan 12, 2021 at 17:20 UTC
    Second this. Projects like this always expand to need to consider more things, and an event-driven parser is therefore always the "future-proof" strategy.