Re (tilly) 2: Opinions needed on CGI security
by tilly (Archbishop) on Feb 14, 2001 at 08:23 UTC
|
Consider the following data
<<script (not real script)>script (dirty nasty stuff)>
Trust me, this is not a bogeyman. merlyn is saying
that it is a mistake to think in terms of just trying to
remove known dangerous constructs because it is. That way
lies madness.
Decide what you will allow, and explicitly escape everything
that does not fit a known and specified safe pattern. | [reply] [d/l] |
|
|
I understand completely about filtering by what you will accept and not trying to imagine what to reject... I said as much in my post. My question is, if you have a CGI that does:
$a = "some CGI data <<script blah blah evil stuff";
open (F, ">>file.txt");
print F $a;
... and that's the sum total of the CGI's interaction with the rest of the world, what could a hacker (or anyone) do that's evil? Now, if you will (say) be outputting a web page based on this data later on that's a different story... but that's not the question.
My point is that I agree wholeheartedly that we should be as diligent as necessary to secure our programs and our data. But at some point (and this is a good example) "diligence" turns into unecessary paranoia.
Gary Blackburn
Trained Killer
Update: Ok, so maybe the point from the original poster was to use the data to populate a web page. :-P Seems to me in that case that there's no reliable way of filtering out all possible evil HTML/Javascript (please, someone correct me if there is). But other than that, what else does the poster need to do? | [reply] [d/l] |
|
|
Along merlyn's lines, an easy way to ensure that the user remains within your bounds is to imagine what Apache does when one specifies
Order deny, allow
If the input doesn't meet your guidelines, reject the whole thing on sight of the first mistake. Yes, simply return to the user "no dice". It's simply not worth trying to make replacements which need to be concurrently checked. In the example above, if one wants to reject certain data, then reject the whole shebang if any "evil hacker stuff" is included. That means, first clearly define which tags ARE acceptable. If ANYTHING else is used, reject the entire input asking for user clarification. That's not exactly what this site does, but PM does restrict tags along these lines. The other way is just too hazy. From CGI pod (I've always found this entertaining):
If you import a function name that is not part of CGI.pm, the module will treat it as a new HTML tag and generate the appropriate subroutine. You can then use it like any other HTML tag. This is to provide for the rapidly-evolving HTML "standard." For example, say Microsoft comes out with a new tag called <GRADIENT> (which causes the user's desktop to be flooded with a rotating gradient fill until his machine reboots). You don't need to wait for a new version of CGI.pm to start using it immediately:
use CGI qw/:standard :html3 gradient/;
print gradient({-start=>'red',-end=>'blue'});
If you only filter script tags, then you're missing this DoS HTML tag. On the other hand, if you know which tags are good and ignore all others, you're set for life without trying to track down new exploits.
AgentM Systems nor Nasca Enterprises nor
Bone::Easy nor Macperl is responsible for the
comments made by
AgentM. Remember, you can build any logical system with NOR. | [reply] [d/l] [select] |
|
|
| [reply] |
|
|
There are a few ways to get almost all known HTML/JS evils out of the way...
The simplest, and a very effective one, is to simply URL-encode everything that comes in, like the PerlMonks.Com <code> tag does. The following JavaScript is harmless: <script>alert("I am malevolent");</script> because it has turned into <script>... before your browser sees it.
If you like certain HTML constructs, allow only them, like the <p> & <em> tags I'm using in this post (but not the <form> tag here: <form><input type="text" size="2"></form>
For better safety, as well as flexibility in presentation (HTML, WML, PDF, &c.) using an HTML->internal form->presentation form sequence might be desireable; e.g. using an XML dialect with no scripting, &c. internally.
The only "badness" I know of which can't be readily filtered out this was is an hyperlink containing potentially malicious content, e.g. a link to a site that does evil things, or (but I don't think any current browsers are troubled by this) a buffer overrun in the URL itself or sommat.
But, I'm sure someone will think of something interesting that can be done with <p> in IE 6, and we'll all be back to the drawing board :-)
| [reply] [d/l] [select] |
|
|
There are certainly ways of filtering out all possibly evil HTML and Javascript, but they may be too restrictive for your application. For instance, you could launder CGI data through /([a-zA-Z0-9_&;\s]*)/, which would disallow all HTML except for entities, but this would be much too restrictive for a site like PerlMonks where we need to be able to post code.
Constructing a character class that filters out bad stuff is trivial. On the other hand, constructing a hack-proof set of regexen that permit specific combinations of characters while disallowing others (as in allow <a> but disallow <script> while allowing '<' and '>' if inside a code block) is far from easy.
Everything's implementation of
the latter is something you might want to take a look at.
MeowChow
s aamecha.s a..a\u$&owag.print | [reply] [d/l] [select] |
|
|
Thank you all for the comments so far.
The program is a message board, so the only time the user data
is used directly is when a page is generated.
I was trying to use the method of specifying what I will allow
as mentioned by Merlin. Of all the potential input fields, there
are 4 that I had to specify what I won't allow instead of what I will allow.
This is because these fields can contain HTML. Since there are a lot of
acceptable tags I thought it prudent to specify the ones I don't want.
I've read all I can find on security with CGI's and never found
much that directly related to my program, but after
seeing all the different methods used to attack a program
I thought I'd better do some basic filtering.
This problem grew because I don't spend my time trying to break
other peoples code, I am probably unaware of common "hack" attempts.
I know my code leave it possible to have an unbalanced tag like the <table> tag
and thus the generated page may not display but I haven't found
any method that will match opening and closing tags.
In the hopes that I'm not becoming completly paranoid,
is there any standard filtering that I'm not using
to minimize vulnerability ?
Thanks for all the advice, you guys will make a programmer out of me yet!
-- Brian
| [reply] [d/l] |