I seem to always be the one doing this, but here goes:

Why are you matching HTML with regexes? It's dangerous and fraught with peril, as well as being impossible to maintain or get right. Why not use something like, oh, HTML::Parser and have it deal with the problem of how to figure out what has way and you just ask it "Does tag ABC have attributes X, Y, and Z?"

Or ... attack the problem another way. Either these pages are static or they're not. If they are, then read them by hand. No matter how many you have, so long as they don't change, you'll finish, eventually. (A very large amount and q.v. solution #1.)

If they're generated in some fashion, then don't examine the output, examine the generator! A quick code review with a colleague and a whiteboard will quickly tell you if you're double-generating attributes. Now, if the code is dense and impenetrable, that's a good reason to rewrite it, and in the process guarantee that this issue is a non-starter.

Now, you might have issues with the idea of HTML being embedded in the code. Get it out and use templates. HTML doesn't belong in code, and vice-versa.

Of course, this entire discussion begs the question - why aren't you using CSS?

------
We are the carpenters and bricklayers of the Information Age.

Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.


In reply to Re: Operator for "these expressions, in any order" by dragonchild
in thread Operator for "these expressions, in any order" by jsalvata

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.