in reply to Re: •Re: Filepath validation and untainting
in thread Filepath validation and untainting

Well maybe the specs should change. If you're testing for valid paths, but unable or unwilling to determine the originator OS, or whether the file exists, why validate at all except to remove things which 'break' your code. Management might as well ask you for a module that remotely checks that the user has wiped his arse and washed hands before entering 'tainted' data.


I can't believe it's not psellchecked
  • Comment on Re:^3 Filepath validation and untainting

Replies are listed 'Best First'.
Re: Filepath validation and untainting
by jonadab (Parson) on Feb 13, 2003 at 04:41 UTC
    If you're testing for valid paths, but unable or unwilling to determine the originator OS

    Determining the OS in the general case is very hard, but the OP says he only needs to allow for Win32 or *nix, which narrows things down considerably. For example, "NOD$DKA0:[SYS$MANAGER.LOGS]POWERCHUTE.LOG", which in the general case could be a valid file pathname, is probably not if we limit things to Win32 or *nix. (At least, no sensible person would name a file that on *nix, and it's impossible altogether on Win32.)

    That said, his code is not really going into the detail needed to really check the path's validity. First off, you probably want two separate regexes, one for paths with forward slashes and one for backward. (Either type can start with a drive letter, since Windows can indeed use forward slashes; it just usually doesn't. Assuming you really do want to allow drive letters; more on that in a moment.) Also, there must be a non-separator character between each two separators (unless you have to support CIFS paths ("//node/share/path/file.ext"), in which case you want a third regex for those).

    Also, while you said you don't need to check that the file exists, you probably do want to check that the directory it's supposedly in exists. That is, you may want to strip off the part after the last path separator and make sure what remains is an extant directory. If not, your documentation needs to say that it doesn't guarantee even that.

    Randall brings up a good point with "..", but it seems to me from looking at your code (which allows regexes) that you probably did intend to allow it -- but the module documentation needs to be very clear about that, because allowing access upward to directories outside of the current directory is not entirely consistent with untainting untrusted data. If beside ensuring that the string is a real path you also need to ensure that it doesn't point to a file that should not be able to be accessed this way, then you have to reconsider whether you want to allow "..", absolute paths, and drive letters. If you disallow such things, it removes both drive letters and (more usefully) doubleslash network paths from your consideration, which should simplify matters if anything. You do have to check especially for ".." then, though.

    sub A{while($_[0]){$d=hex chop$_[0];foreach(1..4){$_[1].=($d %2)?1:0;$d>>=1;}}return$_[1];}sub J{$_=shift;while($_){$c=0; while(s/^0//){$c++;}s/^1//;$_[1].=(' ','|','_',"\n",'\\','/' )[$c]}@_}$PH="16f6da116f6db14b4b0906c4f324";print J(A($PH));