Re: Back to acceptable untainted characters

Characters aren't dangerous to your Perl program in itself. Passing them along to something else that may interpret them specially is what's dangerous. And knowing how the components you interact with will interpret the characters is the key to security (at least from this class of problems).

For example:

If you're taking user input and using it in eval, well, you should probably find another way of doing what you're doing. But let's pretend there's some reason you have to, as an illustration. You could write something like this:
```
eval "print OUT '$unsafe_input' or die";
[download]
```
. This isn't safe; we can see this by thinking about how Perl will interpret their input. Well, inside a string, a variable identifier will be interpreted, which might give away secret information (think $unsafe_input="',\$DATABASE_PASSWORD,'"), so the this-is-a-variable characters are unsafe---$@%. Also, escaping from that quoted string would be a real problem (think $unsafe_input="'; system('cat /etc/passwd'); print '"), so single-quotes are dangerous.
As you mentioned, if you're using mySQL, single-quotes are dangerous.
To the shell, shell metacharacters are dangerous, so you have to be particularly careful of $;*&|?.
If you're printing the user's input to a Web page, you better make sure it doesn't have HTML tags in it, or else your Web site will be vulnerable to a Cross-Site Scripting attack. So, you'd better prevent code to create HTML tags, such as <>. Taint mode doesn't catch this one.
If you're using a user's input to execute a network protocol, say SMTP protocol, you have to be careful of single dots on a line by themselves, since they introduce a command. If you're taking the body of a message, for example, and sending it over SMTP, they could enter into their body ".\r\nMAIL FROM:<somebody_else>\r\nRCPT TO:<victim>\r\nBODY\r\n..." to leave the message they were sending, and create their own, perhaps to spam.

Once you know what sorts of characters are unsafe, you need to stop them from being interpreted by the program you're interacting with. The two ways to do that are to disallow them, or escape them. Escaping is usually riskier because it's easier to make a mistake. For example, let's say you're trying to fix that eval with $user_input =~ s/(\"\@\$\%)/\\$1/g;. Well, what if $user_input='\"; cat /etc/passwd; print \"rest'? Your RE replaces the " characters with \, so the \" becomes \\"---an escaped backslash, and an unescaped quote. Yikes! The solution is to also escape the backslash. Now \" turns into \\\", which is an escaped backslash followed by an escaped quote.

The other option is to disallow them altogether. This is safer, since it's easier to do this correctly, but it can be restrictive. If you're asking a user to enter a passage from a book, it may not be acceptable to disallow quotation marks. If you're asking a user for a password, you shouldn't reject any characters.

The final thing to keep in mind is when you're restricting characters, it's safer to think of all of the characters you know are safe than aren't. That way if you make a mistake, you've erred on the side of caution.

Taint mode is designed to help you do this, but it only works when it knows which input sources are unsafe, which interactions are unsafe, and when you tell it how to make user input safe for use. You should be using taint mode, but only as a tool for catching you when you make a mistake, not as a primary line of defense.

Whenever you're interacting with some system that a user can't normally interact with (a database you're authenticated to, a shell on a public Web server), think hard about what an attacker could to to make a mess of things, and then prevent it. Try a few things, and see how they're handled. Getting a particularly devious friend or co-worker to think of ways to subvert your system can be effective.

A final note is that some modules can provide extra information to taint, such as telling DBI to treat all queries as an interaction that requires taint checking, or telling CGI that its output should be taint checked. I don't recall the names of these modules, but CPAN should be able to find them.

Update: Fixed eval example near top so it's actually insecure.

Comment on Re: Back to acceptable untainted characters Select or Download Code

Replies are listed 'Best First'.
Re: Re: Back to acceptable untainted characters by bradcathey (Prior) on Sep 07, 2003 at 17:38 UTC
sgifford, would you or some mind explaining this in greater detail: If you're printing the user's input to a Web page, you better make sure it doesn't have HTML tags in it, or else your Web site will be vulnerable to a Cross-Site Scripting attack. So, you'd better prevent code to create HTML tags, such as ><. Taint mode doesn't catch this one. Currently, using HTML::Template, I'm doing stuff like this with data from my db: `my $html = "<b>Signed up:<b>\n <table><tr><td>$data<\/td><\/tr> <\/table>\n"; $template -> param(html => $html);` [download] Cool or not? Thanks	[reply] [d/l]
3Re: Back to acceptable untainted characters by jeffa (Bishop) on Sep 07, 2003 at 17:51 UTC
Barring security issues: Well, you can do it that way ... or you could set up a "widget" called `signed_up.tmpl`: `<b>Signed up:<b> <table><tr><td><tmpl_var data><\/td><\/tr> <\/table>` [download] Include that in the main page: `<tmpl_include signed_up.tmpl>` [download] And just make sure that the HTML::Template object responsible for populating the main page handles that `<tmpl_var data>` tag. I discuss this technique more at 3Re: HTML::Template - complex sites. Feel free to play with the code i have posted there. Now then, as for security ... if you don't want to allow your users to submit HTML, the easiest hack you can do is: `my $data = '<html>evil tags!!</html>'; $data =~ s/</</g;` [download] This will convert all < characters to `<` which will effectively keep the tag from rendering. jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l] [select]
encoding entities (Re: Back to acceptable untainted characters) by jonadab (Parson) on Sep 09, 2003 at 03:08 UTC
$data =~ s/</</g; If you're going to do that, you may just about as well do it all the way: `use HTML::Entities; $datum=encode_entities($datum);` Though I admit your solution is probably faster and yet will probably get the job done. The problem arises when you do in fact need to allow some HTML through. Another poster suggested to decide on a list of permissible tags and strip all others. I agree with that as far as it goes, but you also want to strip certain attributes (notably, any that start with the word 'on', case-insensitive) regardless of what tag carries them, and if you're concerned about the sort of games that can cause browsers to hang, crash, or just plain not show the page, you probably also want to reject (or encode entities on) anything that doesn't meet some minimal standard of structure; it's relatively easy to check wellformedness, though if you want to allow legacy HTML4 and earlier you have to do a little more work. At minimum, though, you probably don't want to allow any tag to be closed that wasn't opened, and you almost certainly want to be sure that any table-related tags that are opened are also closed. This starts to get messy, and personally I've gone with the approach of putting the burden on the person who is submitting the HTML: if it's not wellformed, I pass it through encode_entities(), warn them that I've done so, and provide a link to an explanation of what wellformedness means and why it's useful. Because of the way browsers automatically decode entities, even in the values in form elements, they can then directly edit their content to fix it up, and if they get it wellformed on the next submission it'll go into the database as-is. Whether you can take this approach will depend somewhat on how much burden of quality you're willing to place on the people writing the HTML in question. `$;=sub{$/};@;=map{my($a,$b)=($_,$;);$;=sub{$a.$b->()}} split//,".rekcah lreP rehtona tsuJ";$\=$ ;->();print$/` [download]	[reply] [d/l] [select]
Re: Back to acceptable untainted characters by genecutl (Beadle) on Sep 08, 2003 at 21:57 UTC
For my web site, I wrote a perl module that cleans up user submitted html, by only allowing sanctioned html tags to pass through. So, you can allow `<P>` and `<b>` but not anything else if you wanted. I intended to submit it to cpan, but never had the time. Anyway, you can download it here: HTMLCleaner.pm. It's got pod documentation. And if anyone wants to develop it, they are free to do so.	[reply] [d/l] [select]
Re: Re: Re: Back to acceptable untainted characters by sgifford (Prior) on Sep 08, 2003 at 15:13 UTC
It depends on whether `$data` is under the user's control or not. If it is, it's best to prevent all HTML. I usually use an HTML escaping module, like the `escapeHTML` function provided by `CGI`. Otherwise, if a malicious user can trick a legitimate user into setting `$data` to some Javascript code, the malicious user can steal cookies for your domain, or any other information in the page or the form.	[reply] [d/l] [select]
Re: Re: Back to acceptable untainted characters by bradcathey (Prior) on Sep 07, 2003 at 12:37 UTC
Thanks much graff and sgifford for your indepth replies. I think I'll have to print them out and study them indepth. Great stuff. I look forward to the day when I can be as helpful to some other fledgling coder.	[reply]
Re: Re: Back to acceptable untainted characters by bunnyman (Hermit) on Sep 08, 2003 at 19:35 UTC
eval 'print OUT "$unsafe_input" or die'; I do not think this is going to do what you said it does. Variables are not interpolated inside 'single quotes', so the eval only interpolates the variable one time. So, even if $unsafe_input='$DATABASE_PASSWORD', the password would not be printed. On the other hand, it would print the password if the code was like this: `eval "print OUT \"$unsafe_input\" or die";`	[reply] [d/l]