G'day SergioQ,

I feel I should start by echoing what others have said about not using a regex for parsing this type of data.

You haven't shown all data (which was probably a good move if it's huge). However, I'd suspect you have something like "<div class="soda"></div>" earlier in the the data; that would explain why $1 is a zero-length string (assuming that's what you meant by "is nothing").

I recommend that you use Regexp::Debugger to see exactly what is being matched.

As a general rule, peppering a regex with .* or .*? is a bad move: it will often produce unexpected, or at least unanticipated, results.

If you want to match all characters up to, and including, some terminal character, then match all the characters that aren't the terminal character followed by the terminal character. For example:

$ perl -E ' my $x = qq{ <div class="soda odd">\n Power. Grace. Wisdom. Won +der.\n </div>}; say $x; $x =~ m{<div class="soda[^>]+>\s*(.*?)\s*</div>}ms; say "|$1|"; ' <div class="soda odd"> Power. Grace. Wisdom. Wonder. </div> |Power. Grace. Wisdom. Wonder.|

Again, I am not advocating using a regex to parse this type of data. Furthermore, if you do have "<div class="soda"></div>" earlier in the the data, $1 will still be a zero-length string

— Ken


In reply to Re: I match a pattern in regex, yet I don't get the group I wanted to extract for some reason by kcott
in thread I match a pattern in regex, yet I don't get the group I wanted to extract for some reason by SergioQ

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.