Tricky wrote:

In reply to Ovid and Abigail's comments : the TokeParser module is a great idea, the problem is that my remit is to investigate how regexps can be applied to reformatting HTML pages. I have a regexp for a background colour attribute, though the '#' character treats all characters following as a comment!

I'm not sure what you mean by your statement that your "remit is to investigate how regexps can be applied to reformatting HTML pages". If, by that, you mean that someone else has tasked you with this, then they have made a mistake. If someone comes to me and says "Ovid, I need you to deflea my cat. Here, use this shotgun", then I know that person made a mistake that's all too common in business. In short, the mistake is to say "here's a solution, let's see how we can make it fit our problem." That's absolutely the wrong way to go about things.

Mind you, it's an easy thing to do. I suspect that cyanide kills fleas. Therefore, I might ask a friend "how can I use cyanide to deflea my cat?" When that friend tells me to use flea powder, my first instinct shouldn't be "but I've got all of this cyanide handy, how do I use that?" Instead, a better tactic is to revisit the original problem. How do I remove the fleas from my HTML ... er ... cat? If the proposed solution is better than mine, I should be willing to swallow my pride and go with the best solution. Heck, if all politicians believed that, we'd have a much better country :)

Just for giggles, let's look at some valid HTML tags:

<a href="foobar.txt" onclick="javascript:go_boom()">stuph</a> <A HREF =foobar.txt ONCLICK='javascript:go_boom()'>stuph</a> <A HREF = 'foobar.txt' ONCLICK= 'javascript:go_boom()' > stuph </a > <font color="#FAFA519">test</font> <font color="FAFA519">test</font> <font color="fafa519">test</font> <font color=fafa519>test</font> <font color='fafa519'>test</font> <font color=fafa519 >test</font>

Do you like all of those font tags? Most browsers will render all of them identically. That's a great example of why most regular expressions will fail. They're tough to write.

But just to show you that I'm a good sport about how to deflea your cat, here's a link to Tom Christiansen's article, HTML Hacking with Regular Expressions. Enjoy!

Cheers,
Ovid

New address of my CGI Course.


In reply to How to Deflea a Cat by Ovid
in thread Regexps to change HTML tags/attributes by Tricky

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.