doran has asked for the wisdom of the Perl Monks concerning the following question:

I have a (relatively) simple script which checks a directory every few seconds for any files. If a file is there, it checks to make sure its name matches the format we're looking for (something like foo.bar.baz.xml). If it does match, the file gets processed. If it doesn't match, the file is to be moved to a different directory for later inspection.

The challenge I'm facing is that I need to untaint the 'bad' filename before I'm allowed to move it (actually rename, which in this case moves the file). Well, since it's a 'bad' file, I'm not sure what its name looks like. In fact that's about the only thing I know about the filename, that it doesn't match what I'm looking for.

So, what is the best practice here? Should I just match anything (ie. $filename=~/^(.+)$/; $file=$1;), or maybe turn taint checks off for that little bit of the script which moves the file (is that even possible, since taint checks are enabled from the command line)? Matching everything raises a big old red flag in my head, but I don't really see any other way around it.

I should mention that by the time the script gets to this point, I've performed at least a couple of tests on the file (ie. -f $filename) to determine that it does look like a file to the system. So that may limit my vulnerability. Lemme know.

The one thing I'm sure of is that I'd like that directory empty before it gets checked again.

Thanks for any insights.
db

Replies are listed 'Best First'.
Re: Untainting 'bad' filenames
by AgentM (Curate) on Dec 08, 2000 at 00:49 UTC
    Taint checking is more designed for things like unsecure user input. If your directory is secure (not world reabable- useable by local trusted users, etc.), then taint checking doesn't offer much. You should check the files yourself with your own regex to ensure that they meet the limits of a certain length and file naming convention. if not, do not process it. If this is a directory where some generated files are created by user "filebot" then you probably don't even need to worry about that. I don't see taint checking particularly useful here, since I imagine that you are using an perl directive after a series of readdirs, right? not executing some shell commands which should be avoided anyway, right?
    AgentM Systems nor Nasca Enterprises nor Bone::Easy nor Macperl is responsible for the comments made by AgentM. Remember, you can build any logical system with NOR.
      Yea, the only thing that should be writing files to the directory I'm checking is a custom built (not by me) ftp server. If I start encountering files that don't untaint easily, I've got real problems and will have to deal with it accordingly.

      Thanks to everybody for the input.

      db

Re: Untainting 'bad' filenames
by Fastolfe (Vicar) on Dec 08, 2000 at 00:52 UTC
    You're probably best off doing something like this:
    ($untainted) = $file =~ /^(.*)$/s;
    Just be certain you only use $untainted in a "secure" way. I suspect a rename call is fine, but don't think of using it in, say, system.

    The example you used ($filename=~/^(.+)$/; $file=$1;) breaks if there's a newline in the filename and if it doesn't successfully match the filename, you could be setting $file to something you do not expect (since an unsuccessful match leaves $1 to what it was set to before). Taint-checking probably wouldn't pick up on this mistake.

      Yikes, newlines! Thanks. I forgot about the //s.

      BTW, I make sure the $1 is localized by wrapping the regex in its own block. That's something I learned a couple months ago.

        That ensures that the value of $1 from a successful match does not leak outside the block, but it doesn't protect you from $1 from an earlier successful match leaking into the block:
        #!/usr/local/bin/perl -w use strict; $_ = 'perlmonks.org'; for my $re (qw/monks minks/) { /(perl)/; print "$1\n"; { # new block for regex /($re)/; print "$1\n"; } } __END__ perl monks perl perl
        It's generally necessary to check whether a regex succeeded before using any of the regex special variables.
Re: Untainting 'bad' filenames
by turnstep (Parson) on Dec 08, 2000 at 00:49 UTC

    Well, I wouldn't match "anything" but if you know what the format is roughly going to look like, you could limit it to something like this:

    if ($filename =~ m#^[\w.]$#) { print "Ready to process $1...\n"; } else { print "Skipping $1: bad characters detected\n"; ## Log to a file, email someone, raise a ruckus, etc. }
Re: Untainting 'bad' filenames
by dws (Chancellor) on Dec 08, 2000 at 01:58 UTC
    rename doesn't do shell expansion. system does, but only when you pass it a single argument.

    One reason to "taint check" filenames, particularly if the starting directory must be writeable by people or scripts that you can't necessarily trust, is to contain damage by preventing hacked filenames from moving downstream where they might be handled by other scripts or applications.

    So it looks like what you need is a regex that will sanity check a filename given some set of rules for whatever platform you're on. (E.g., don't allow shell metacharacters).

    You haven't said what you need to do if a filename flunks checking. Delete it?

    if ( $filename =~ m/^(?:[A-Za-z0-9.])*$/ ) {
       rename $filename, "$newdir/$filename" or die "rename $filename: $!";
    } else {
       unlink $filename or die "unlink $filname: $!";
    }
    
Re: Untainting 'bad' filenames
by PsychoSpunk (Hermit) on Dec 08, 2000 at 00:57 UTC
    The challenge I'm facing is that I need to untaint the 'bad' filename before I'm allowed to move it (actually rename, which in this case moves the file). Well, since it's a 'bad' file, I'm not sure what its name looks like. In fact that's about the only thing I know about the filename, that it doesn't match what I'm looking for.

    But, you do know the filename. You have to know the filename in order to know that it doesn't match the pattern you want files in this directory to be named. After that, take AgentM's advice and just move it, unless the directory is world-readable.

    ALL HAIL BRAK!!!

      The program knows the filename, but I can't anticipate all permutations in advance (so I can write a tight regex) and a filename from a readdir must be (I'm pretty sure) untainted before I can move it (assuming I have Taint warnings turned on).

      But like I mentioned above, since the directory I'm reading from is supposed to be reasonably secure, if I start encountering really weird filenames, I have bigger problems than just untainting.

      thanks

        The program knows the filename, but I can't anticipate all permutations in advance

        I'm not sure about the taint feature in this respect, so I'll refrain from commenting further on that issue. But with the filename, when I say you do know the filename, it is in relation to the script. IOW, your spec seems to state that you do know the format of the filename, but not the filename.

        But in order to compare the filename to a regex, you (the script is simply an extension of you; be the script :) have to know the filename. The regex shouldn't check all permutations of the name. It should check valid permutations.

        In which case, you can write a very tight regex, since it is based on your valid filename. I think you're taking too many variables into account here with the solution of your problem. I see a single variable: the filename, and a single control: the format the filename should match. This makes it a very binary operation. It matches or it doesn't. What is it that I'm missing in this discussion? (This is purely discussion, since it seems as though someone may have provided a solution that you will use.) I'm interested in case I ever see this problem myself.

        ALL HAIL BRAK!!!

Re: Untainting 'bad' filenames
by elwarren (Priest) on Dec 08, 2000 at 04:59 UTC
    Sounds to me like another file upload script. If you don't have control over where the file is coming from, then how do you know that it's not still being written to? You wouldn't want to move it before it's done...

      elwarren brings up a good point about checking to see whether the file is still in the middle of being ftp'ed by another process. If you are on a unix (or Solaris, Linux, etc.) box, check out the fuser command. See your local manpage for more info on it.

        Sometimes I get around this by writing out a second file after the first is completely written. Then the process polls the directory looking for filename.whatever.done When the .done file shows up I know it's safe to move the file I actually want.

        Reading some fresh posts it looks like you're grabbing files from a custom ftp server, so this really isn't a solution for your problem...