Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Security techniques every programmer should know

by Juerd (Abbot)
on Dec 26, 2004 at 22:47 UTC ( [id://417490] : perlmeditation . print w/replies, xml ) Need Help??

These have all been mentioned numerous times, but many programmers still don't understand the risks of using power tools. Because I think everyone knows why it is important to program with security in mind, I'll just begin without any further introduction.

Know your environment

Perl is written in C and doesn't prevent you from shooting your own feet. This has some very dangerous implications that not every Perl programmer is aware of. Even though the language you use, you still need some basic knowledge of C to create secure programs.

The platform perl runs on is also important. Linux asks for different security measures than Win32. Even the filesystem that is used can be important: are FooBar and foobar the same file, or not? If you program for one specific platform, make it work only on that one.

Read documentation!

Your program will be used by others

Or maybe not. But always assume the worst. If not because use by others is a probable future, then to keep yourself focussed on the important issues.

Security comes first

Your boss may tell you that the first priority is that everything works, but it is your job to tell him he's wrong. If something is wrong, make sure the program dies before more goes wrong. It's better to have a program that does nothing at all than to have a program that does everything that is expected from it and provides a backdoor for evil-doers as a free bonus feature.


Encode and escape!

Knowing your environment includes knowing all the protocols and file formats used. If you are a web programmer, then you should know HTTP, HTML and probably CSS and JavaScript too.

Of every string that comes in, you should know the character set and, more importantly, its encoding. Before doing anything with the data, it's best to convert it to Perl's internal format.

For example, if your input is %-encoded UTF-8, use:

use Encode qw(decode); use URI::Escape qw(uri_unescape); my $string = decode 'utf-8' => uri_unescape $input;

If you don't know exactly how to unescape the incoming data, use a module, like CGI (or its faster equivalent, CGI::Simple) and let that handle it for you.

The reverse is also true. Before outputting a string, make sure it is in the right format. For example, to output the $string we just decoded in an utf-8 encoded HTML-document, we can use:

use Encode qw(encode); use HTML::Entities qw(encode_entities); my $output = encode_entities encode 'utf-8' => $string;

So even though input and output are utf-8, we still explicitly decode and encode it from and to Perl's internal format. If the output was part of a URL, we'd also unescape and then re-escape the data. This is to make sure no strange octet (byte) slips through. Another benefit is that in between, you have a string that normal Perl functions can manipulate without needing to have special facilities to handle a certain character set or encoding.

(Note: Perl's internal format happens to also be utf-8, but you should never assume this. Always explicitly decode and encode!)

If you don't escape properly, your program is prone to injection attacks. These include, but are not limited to:

  • SQL injection
  • HTML/Javascript injection (Cross site scripting, XSS)
  • open injection (to avoid, use 3-arg open, not 2-arg)
  • Shell command line injection
  • SMTP injection (don't let others abuse your machine as a spam gateway!)

Every output format requires its own escaping. Even better than escaping data, though, is preventing interpolation when possible by using placeholders (DBI) or a list variant of a function (system, exec, open). This skips escaping and unescaping by using a more direct mode of communication. If internally it is still implemented as escaping+unescaping (DBI::mysql), at least you know knowledgeable people take care of it.

Null bytes are scary

Several control characters are scary, because they often have special meaning in certain string formats, but the null byte is the most scary of all. In C, a null byte (\0) indicates where a string ends. However, in Perl, it's just a normal character. This has advantages and disadvantages. The disadvantages are more important to be aware of. Many of Perl's functions are implented using C functions, and in general, you can (and SHOULD!) assume they're not removing the null bytes for you.

Suppose you have written a CGI-script that does nothing more than display a page from the current directory. Storing data in the working directory is often a mistake in itself, but for this contrived example, let's ignore that.

#!/usr/bin/perl -w # this is page.cgi use strict; use CGI::Simple; use File::Slurp qw(read_file); my $cgi = CGI::Simple->new; my $page = $cgi->param('page'); die if $page =~ m[/]; # Disallow pages from other folders print "Content-Type: text/html\n\n"; print read_file "$page.html";

You disallow anything that has a slash in it, and ".html" is used in the read_file call, so only .html-files from the current directory can be read, right?

Wrong. Just poisoning the data with a null byte is enough to evade the .html restriction. URI-encoded, a null byte is %00.!

The underlying function is a C function. It thinks the string ends where the null byte is. So it opens page.cgi and ignores the "\0blah!.html" part. But wasn't File::Slurp a pure Perl module? Yes, it is. But it uses sysopen internally! Don't let the "sys" part fool you: open uses the same internal C function.

Instead reading through every module and Perl's source to find out what it uses, just remove all null bytes unless you have a good reason to keep them around. While you're at it, remove other control characters as well.

$string =~ tr/\x00-\x09\x0b\x0c\x0e-\x1f//d;

I skipped 0x0a and 0x0d because they are LF (line feed) and CR (carriage return), used for line endings. Depending on the application you write, you may need to exclude more characters, like vertical and horizontal tabs and form feeds.

Taint mode

A good way to make sure you test each string before using it externally is to use Perl's taint mode. It is invoked with -T. The previous example would only need one a small change.

#!/usr/bin/perl -wT ... my ($page) = $cgi->param('page') =~ /^(\w+)\z/ or die; print "Content-Type: text/html\n\n"; print read_file "$page.html";

Note that you should NEVER blindly use . or [^...] in your untaint regex. Whitelisting is much safer than blacklisting, and should have preference. For example ^(.+)\z and ^([^/.]+)\z still allow the dangerous null byte. I use \z instead of $, because $ allows \n (newline) just before the end. Know your tools, so learn to use regexes properly!


Please, add your own generic security related advice below. Preferrably with examples of how easy it is to get wrong. There is much more than I have just mentioned. If you know revelant PM nodes or external URLs, link to them. Let's have all the important information in one place.

But realise that knowing what you're doing, and thus reading documentation, is much better than reading only about the risks involved.

Juerd # { site => '', plp_site => '', do_not_use => 'spamtrap' }

Considered by demerphq: "Section titles are too big"
Unconsidered by davido: No consensus in vote: (keep/edit/delete) = (26/35/0)
Considered by kutsu: "Edit: Move to tutorials" Vote: 4/10/0
Unconsidered by davido: Juerd knows where to post tutorials. He chose to post this as a Meditation. Let's respect the author's decision. Juerd's reaction: the original idea was to consider it for a move, or to post a new node, after having received lots of additional sections. However, I expected much more response than I got. Making this node a tutorial in its current state may give some people the impression that all important security issues are discussed, which is far from true.

Replies are listed 'Best First'.
Re: Security techniques every programmer should know
by dws (Chancellor) on Dec 27, 2004 at 04:14 UTC

    To that, add

    Know where to look for Exploit notices, and stay up-to-date

    That includes periodically reviewing the change logs on the CPAN modules you use.

Re: Security techniques every programmer should know
by Aristotle (Chancellor) on Dec 27, 2004 at 03:10 UTC

    Dealing with nuls, my preference would be to consider them an end-of-string marker.


    After all, that's what the underlying system calls will get to see.

    Makeshifts last the longest.

Re: Security techniques every programmer should know
by Jaap (Curate) on Dec 27, 2004 at 09:47 UTC
    In stead of blacklisting with
    $string =~ tr/\x00-\x09\x0b\x0c\x0e-\x1f//d;
    one should whitelist, allowing certain characters and forbidding the rest:
    if ($string =~ m/^([a-zA-Z0-9_])$/) { my $safeString = $1; ### also untainted now }
    Ok you say that in the Taint part, but i would add it to the "Null btes are scary" part.

      Your code will call anything with whitespace an unsafe string. While that's much better than no checking, how about:

      $string =~ s/!([\w\s]+)//; ##add other allowed chars as needed
      That will sanitize all strings to contain only numbers, digits, the underscore and whitespace. A more complete regex (which would still not include unicode or international chars) would be:
      $string =~ s/!([\w\s\!\@\#\$\%\^\&\*\(\)\\\`\~\-\+\=\,\.]+)//;
      (Yes, there's more escaping there than strictly necessary.) Suddenly, that transliteration is looking a lot easier to maintain. If your allowed set is "everything but nulls and control chars", then you're better off explicitly excluding the known control-char set.

      Denying all, then allowing is a good general rule of thumb. But, in this case, the "dangerous" items are a fixed set while the "safe" items are much more variable -- so it makes sense to simply remove that which is dangerous.

      Update=> Aristotle reminded me that, as \s includes \n, these regexes will not strip newlines; that means strings sanitized with these will be unsafe if executed with a shell (e.g. system("$string");). This further shows that inclusion-matching isn't as good, in this case, as merely stripping "bad" data out.

      Anima Legato
      .oO all things connect through the motion of the mind

        \w matches different things depending on your locale. If you have a German locale, for instance, it will match ß.

        The danger of using perl's shortcut character classes, as was pointed out to me by DrHyde.

        "Cogito cogito ergo cogito sum - I think that I think, therefore I think that I am." Ambrose Bierce

        Are you sure you want to use \s? That includes \n, you know.

        Makeshifts last the longest.

Re: Security techniques every programmer should know
by Anonymous Monk on Dec 27, 2004 at 09:12 UTC
Re: Security techniques every programmer should know (Security References)
by eyepopslikeamosquito (Archbishop) on Dec 29, 2004 at 02:06 UTC
Re: Security techniques every programmer should know
by ihb (Deacon) on Dec 27, 2004 at 23:45 UTC

    Taint mode does not help against null bytes (or any other bytes) in your read_file "$page.html" example. Reads are not checked for tainted data. Writes are though, so write_file "$page.html" would've been stopped by the -T switch.

    In short, I'd like to add this: Don't think -T will do the job for you! Just think it may help you if you slipped up.


    See perltoc if you don't know which perldoc to read!
    Read argumentation in its context!

Re: Security techniques every programmer should know
by fizbin (Chaplain) on Dec 30, 2004 at 15:41 UTC
    This is more secure shell programming than secure perl programming, per se, but when passing arguments to an external command, in addition to the advice above about general control-character cleaning and proper escaping, be wary of cases where the passed argument might be interpreted as an option. For example, consider this code that might be part of a man2html gateway:
    # $page and $section are parameters from the user that have been clean +ed of 0 bytes and obvious control characters my $mantext = ''; my $status; my $pid = open(KID_STDOUT, "-|"); if (not defined $pid) { die "cannot fork: $!; bailing out"; } if ($pid) { ## parent while(<KID_STDOUT>) {$mantext .= $_;} $status = $?; } else { close(STDIN); open(STDERR, '>&STDOUT'); if ($section) {exec('/usr/bin/man', $section, $page);} else {exec('/usr/bin/man', $page);} } # now reformat $mantext and display it.
    Now, there are some nice security plusses in this code - the use of the many-arg form of exec, for example, avoids a whole host of shell-escaping issues. However, this gives a potential attacker shell access on any system whose man command allows the -P option. (quid vide) All an attacker needs to do is pass in
    as part of the url, and their command will be executed. (And fed the "cat" manpage as input, but that's immaterial)

    The general lesson here is that options change the behavior of external commands in ways you don't expect; don't allow the user to send options to external commands. Fortunately, with almost every unix command passing a '--' will prevent subsequent arguments from being interpreted as options, so a fixed version of the above code could read:

    # $page and $section are parameters from the user that have been clean +ed of 0 bytes and obvious control characters my $mantext = ''; my $status; my $pid = open(KID_STDOUT, "-|"); if (not defined $pid) { die "cannot fork: $!; bailing out"; } if ($pid) { ## parent while(<KID_STDOUT>) {$mantext .= $_;} $status = $?; } else { close(STDIN); open(STDERR, '>&STDOUT'); if ($section) {exec('/usr/bin/man', '--', $section, $page);} else {exec('/usr/bin/man', '--', $page);} } # now reformat $mantext and display it.
    As an aside, note that the following code contains the same hole as the initial code:
    my $qpage = quotemeta($page); my $qsect = quotemeta($section || ''); exec("/usr/bin/man $qsect $qpage");
    The issue is not shell escaping - the issue is that when calling external commands, be aware that many commands use arguments beginning with "-" to mean "radically alter your behavior in some fashion". This leads to behavior you can't predict ahead of time, which means that guarding against it is almost impossible if you allow options to be passed along.

    Note that on an MS windows platform, (and, I suppose, on VMS too) some external commands may treat arguments beginning with '/' as options. Unfortunately, I don't know of any standard way to prevent that as with the '--' common on unix; on those platforms you'll just have to be careful to strip leading / characters in cases where the variables are being used in a way that could pass unwanted options to an external command.

    -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
Re: Security techniques every programmer should know
by Juerd (Abbot) on Dec 28, 2004 at 22:28 UTC