I've recently been tasked with building a replacement for a commercial system run by two industry competitors who are gouging entrepeneurs and developers at 50% of their net sales revenue.

The system I will be building will allow developers to log in, check/update/maintain their software listings (as well as their own profile and preferences), and give them a spiffy page + screenshot for each of their applications.

This means I'll have to accept and process some minimal forms of markup. Herein lies my philosophical paradox...

I've been building portals, web-like CMS systems and other things for years, and for the most part have limtied the input accepted to plain text or a very small subset of acceptable markup. This system can't allow that level of inflexibility.

What is the best approach towards allowing specific tags through (<p>, <br />, <a ...>, <img..>, but disallowing the use of all the others (<iframe>, <script>, <style>, etc.).

I also have to take into consideration the dozens of ways to get xSS through, and protect against those.

Deny all, allow some? Filter all? Strip all and rewrap with allowed tags? Some other combination? I'd rather not have to run the HTML through a series of complicated subs to strip, massage, and de-fang the tags they're using, if possible.

I realize that PerlMonks and Slashdot and other large portal-like systems are doing this already. What approaches and techniques are best towards achieving this goal, while still retaining a good level of customization for the developer creating their own "listing" page?


In reply to Pondering Portals by hacker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.