This would probably be a good application for HTML::Parser or any of the DOM-building modules that I am sure other monks will hasten to recommend.

However, your problem is probably that the "stretchy" groups in your pattern are not matching as you intend. I suggest (untested) m!<div class="soda[^"]*">(.*?­)</div>! instead. The important difference is that this alternative constrains the initial "discard" match to not include double quotes, and therefore not to run past the opening div tag. Also note the use of ! as delimiter to avoid "leaning toothpick syndrome" in this version.

If you are trying to catch multiple items from a single large input block, I suggest (also untested):

while (m!<div class="soda[^"]*">(.*?­)</div>!g) { say "matched!"; my $grp = $1; say $grp; }

If the text you want does not contain additional HTML, you could also replace (.*?) with ([^<]*). Generally, more constrained search patterns like these will also perform better because they will need backtracking less often.

If the text you want can contain additional HTML, use HTML::Parser; it will work far better.


In reply to Re: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason by jcb
in thread I match a pattern in regex, yet I don't get the group I wanted to extract for some reason by SergioQ

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.