Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

HTML tags to be filtered out

by vroom (His Eminence)
on Aug 23, 2000 at 19:19 UTC ( [id://29205]=monkdiscuss: print w/replies, xml ) Need Help??

Here's what I'm thinking of filtering out right now, let me know if you've got additions, or vehemently oppose any of these tags being filtered out respond below.
  • SCRIPT
  • APPLET
  • EMBED
  • OBJECT
  • INPUT
  • IFRAME
  • ILAYER
  • LAYER
Update: I think I'll basically steal the allowed HTML nate uses on Everything2.com. (Check it out here here). I'll allow A HREF's and basic TABLE TR TD tags in addition for now. I think those are the only things you should need for a plain-old writeup.

We can come up with a new policy for home nodes, where more is allowed. The options I see for home nodes are allowing people to selectively allow additional tags to go through unfiltered when they view a home node, level requirements for certain tags, an approval process for home nodes with possibly dangerous tags, or an agreement that users implicitly need to follow when constructing their home nodes.

vroom | Tim Vroom | vroom@cs.hope.edu

Replies are listed 'Best First'.
RE: HTML tags to be filtered out
by nuance (Hermit) on Aug 23, 2000 at 19:36 UTC
    If you're about to write a filter anyway, and I assume that you are going to tell people that their input has been "censored". Why not add a function that looks for PRE and TT tags, this is probably someone who doesn't know about the code tag. We could then point out that using code tags has many other benefits, allows people to wrap the enty, download the code directly etc.

    This way we could help educate new people who obviously dont realise that the <code> functionality is there.

    Nuance

      I agree with you concerning <pre>, but not <tt>. I use <tt> tags around keywords and simple constructs that are inlined with the text of my post. I wouldn't want them treated as code, nor would I want the computer second guessing my intentions.
        Actually I was talking about the pre tt combination when used together. If someone knows how to use these tags, then when they're trying to represent a "code block" they'll "probably" use both tags together.

        I agree that we should not give people gratuitous warnings about using tt or even pre, but a gentle suggestion when they're using both together would be IMHO useful.

        In fact if you look at the post you were replying to, you will see that it uses tt tags. Sorry if my intent wasn't clear.

        Nuance

        I wouldn't mind a gentle warning about <tt></tt> every time I use those tags in a post, if it helped make Code Catacombs more useful.
            cheers,
            ybiC
RE: HTML tags to be filtered out
by chromatic (Archbishop) on Aug 23, 2000 at 20:16 UTC
    The problem with explicitly disallowing certain things, is that you can't effectively get everything until it comes up again.

    I suggest filtering out everything except certain tags. (TABLE and the basic formatting things would be all I allow.)

RE: HTML tags to be filtered out
by Adam (Vicar) on Aug 23, 2000 at 20:33 UTC
    Its a sad day when heavy filtering of people's home nodes occurs. I fully agree that posts should be limited to text and code, but not home nodes. Maybe start people with heavy filtering and as they move up the XP ladder slowly giving them back certain tags. I really believe that people who have invested the time into this site to attain certain levels will not abuse the privelage of more advanced tags like Script and Form. If you don't want people posting messages to the chatterbox from home node buttons, just say so. I think that every person with such a button would gladly change it. But please don't impose such a sweeping set of restrictions.
RE: HTML tags to be filtered out
by KM (Priest) on Aug 23, 2000 at 19:34 UTC
    <FORM...> and </FORM> Adding those can subvert the forms on a page.

    Cheers,
    KM

      Yeah those were on my mental list but didn't make the listing above for some reason.

      vroom | Tim Vroom | vroom@cs.hope.edu
RE: HTML tags to be filtered out
by tilly (Archbishop) on Aug 23, 2000 at 19:50 UTC
    Well isotope makes a case for not allowing the table tag. At least until you are level 5. :-)

    Seriously, the right way to handle security is to explicitly list what is allowed and filter out all else. Add to what is allowed as the need/desire comes up.

    EDIT
    I should explain the isotope comment.

    At this moment there is an image snuck onto a novice's page through the table tag. Personally I think it is very respectfully done, but the point is that until you really stop and think about a construct, you have no idea what someone may come up with...

      Actually table tags can be very bad.
      Someone can use: (no i'm not going to demonstrate)
      [/TD][/TR][/TABLE]

      To close the current table, usually horribly breaking the rest of the page.
      I accidentally forgot to close a table once (on my home node), and nearly couldn't get back into the editor to undo the change.

      My personal suggestion is to start with the rtf format, and allow only those tags that would enable rtf-like formatting. Build from there, but slowly.

RE: HTML tags to be filtered out
by turnstep (Parson) on Aug 23, 2000 at 19:59 UTC

    Others that pop into my mind:

    • IMG (except on home nodes, I cannot fathom a use for them)
    • BLINK :)
    • SELECT and TEXTAREA
    • AREA
    • BUTTON
    • FRAME and FRAMESET
    • MAP

    What about embedded javascript things such as onMouseOver? You can embed some stuff right in the tag itself without using the SCRIPT tag...

      Although I don't disagree with taking things like IMG and BLINK away, I think a good starting point is with tags that may pose a security concern, which are keeping in line with things vroom originally listed.

      Cheers,
      KM

        Actually, I'm starting to lean towards chromatic's option of "Exclude Everything, Allow Explicit". However, TT must stay!! (actually, all text markup should stay, with the possible exception of FONT and H?)

RE: HTML tags to be filtered out
by le (Friar) on Aug 23, 2000 at 20:45 UTC
    Like a good firewall layout, it could be easier to define what's allowed, not what's forbidden.

    My vote is for B, I, OL, UL, LI, A (although this could be a hole too), HR, BR, P; maybe IMG, maybe Hx, maybe PRE and maybe TT.

      Well, since we've started the "Allowed" list already, I'll add my $0.02, and rearrange the list from what I see as the least to most controversial:

      • CODE (of course)
      • STRONG, EM, B, I
      • BR, P
      • <!-- (comments - take a look, you might be surprised at what is already there!)
      • UL, OL, LI, MENU
      • DL, DT, DD (well, I still use them anyway)
      • BLOCKQUOTE (I use this often actually)
      • TT, ADDRESS
      • PRE (perlmonk's code does not always cut it)
      • SUB, SUP
      • A
      • DIV
      • HR

      Almost all of them should require a matching close code as well.

RE: HTML tags to be filtered out
by BigJoe (Curate) on Aug 24, 2000 at 00:00 UTC
    My only question with that is if I am writing code in code tags will it then remove them for example
    print "<input type=text name=foo>";


    --BigJoe

    Learn patience, you must.
    Young PerlMonk, craves Not these things.
    Use the source Luke.
      The correct answer is no. Htmlscreening occurs after the CODE conversion

      vroom | Tim Vroom | vroom@cs.hope.edu
Buzzcutbuddha (Denying tags is the way to go) - RE: HTML tags to be filtered out
by buzzcutbuddha (Chaplain) on Aug 24, 2000 at 00:08 UTC
    Having thought for a while about which way to go with filtering tags (only allowing certain ones, or only denying certain ones), I'm going to put my support with denying certain tags. Allowing only certain tags then limits our ability to include special characters:
    &nbsp;, &amp;, etc
    and that also means that when things move more towards XHTML, the filter needs to be extended or rewritten.

    I think it's easier to filter out the bad/naughty HTML then to make a special case for all of the different HTML features we individually use...
RE: HTML tags to be filtered out
by arturo (Vicar) on Aug 24, 2000 at 01:08 UTC
    with respect to table tags etc ... those can be abused, can't they, to wmess up the structure of the pages? Does anybody *really* need anything more than <code> and basic markup like <i>, <strong> and anchors? I think it's admirable that you're trying to keep things the way they *ought* to be, where everybody respects one another, but maybe that's just not in the cards.

    "He's got about as much personality as a loaf of bread" -- Wally Pleasant, She's in love with a Geek

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: monkdiscuss [id://29205]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (2)
As of 2024-04-25 03:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found