comment on

My suggestion, however, is to pass your inputs through "the Prussian stance" style of sanitization before you even deal with cosmetic cleanup (stripping whitespace). If the very first thing you do to your data is to retrieve the safe portions you want to work with, then the infiltration of tainted data through the rest of the program is minimized. It's far easier to sanitize close to the source of input than later on after the input may have been transformed and passed around to various other components of the application.

Excellent suggestion, and my plan is to do exactly that.

My point is that I was hoping that 'taint' would help me locate all the places where I get user input, and force me to deal with them explictely. But it seems taint mode is not that useful for that, because it makes too many false assumptions about (a) what is considered a "cleaning action" (i.e. capturing groups after a regexp match on a tainted variable), and (b) what uses of a tainted variable are "unsafe" (ex: it doesn't consider printing to the CGI script's STDOUT to be unsafe, eventhough that might result in printing malicious JS to the script's STDOUT).

Maybe I'm completely arrogant here, but it seems that taint mode is in fact counterproductive, because it lulls you into thinking that it spots all the places where you forgot to explicitly address (as in "at least think about") malicious inputs. But it doesn't, again because of its poor assumptions (a) and (b).

This seems like a design flaw in taint mode, which could easily be improved if the assumptions were changed to:

a) A tainted variable is considered to be untainted when the programmer explicitly invokes a function untaint() on it.
b) Any use of a tainted variable in a known dangerous situation raises an error (and that should include writing it to STDOUT).
c) You have to call untaint() on all tainted variables before the end of the process, otherwise an error will be raised (this forces you to take notice when you failed to explicitly address the tainting of that variable).

The main difference with taint mode's current assumptions are that in condition (a), the programmer has to do a much more explicit action to signal that a variable is untainted. In (b), we expand the list of unallowed operations to printing to STDOUT, which should catch a lot of cross scripting vulnerabilities. But I think it's dangerous to assume that we can catch all potential dangerous uses of a tainted variable, which is why I think we need to add a (c), to flag all tainted variables, where they are used in a way that is known to be dangerous or not.

Note that even with these modified assumptions, there is still a risk that you will use a tainted variable in a way that is dangerous but does not correspond to a known dangerous use, and that you use that variable before you have explicitly cleaned it and untainted() it.

However, I suspect that (c) would strongly encourage people to clean and untaint() their user intputs as soon as they acquire them, to prevent the tainting from spreading to other variables on which you will also have to invoke untaint().

Another idea I have is that maybe the "proper" way to deal with security in a Perl CGI context, would be to have a class SafeCGI, which would allow the programmer to "declare" what CGI arguments are admissible, and what types of cleanup to carry out on them (ex: HTML entity escaping, shell character escaping, etc...). If you declare an input without specifying cleanup options, then the default options are to apply everything but the kitchen sink.

Then, all you have to do is to make sure that you always go through SafeCGI instead of CGI, to acquire user inputs. In my case, that's easy enough, because all my application's dialog go through a single dialog factory, so there is only one place where CGI arguments are acquired.

The nice things about this approach are:

If you add a new CGI argument in a form and forget to declare it, you will know about it.
If you forget to specify what cleanup to do on a CGI argument, you will end up cleaning it to the max
Malicious users cannot feed non-defined arguments to your application

The only way you can mess up is if you acquire your CGI arguments through the CGI class instead of CGISafe. But if like me you centralize acquisition of CGI arguments in a central place, that's a non-issue.

I looked around for something like this on CPAN, but couldn't find one. Maybe I'll implement it myself...

I am curious to hear what people think about those ideas. Am I completely off the wall here?

In reply to Re^2: Taint mode limitations by alain_desilets
in thread Taint mode limitations by alain_desilets

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.