There are too many variables involved with a free form markup such as HTML; white-space can fall in arbitrary places, including within tags, tag attributes can change, and even the markup can change without altering the intent of the underlying text. While regular expressions are great for pattern matching, what you're doing is going beyond pattern matching, to markup parsing. Regular expressions might comprise a portion of a full fledged markup parser, but they're not usually a complete solution.

You really ought to be using something more robust than a fragile regular expression approach. HTML::TokeParser and HTML::Parser are two possible alternatives, both of which can handle the intricate nuances of HTML. Regular expressions that handle all the possibilities are difficult to construct correctly, and fragile. An HTML parser is a more suitable tool for the job.


Dave


In reply to Re: huge multiline regex by davido
in thread huge multiline regex by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.