If you're testing for valid paths, but unable or unwilling to determine the originator OS

Determining the OS in the general case is very hard, but the OP says he only needs to allow for Win32 or *nix, which narrows things down considerably. For example, "NOD$DKA0:[SYS$MANAGER.LOGS]POWERCHUTE.LOG", which in the general case could be a valid file pathname, is probably not if we limit things to Win32 or *nix. (At least, no sensible person would name a file that on *nix, and it's impossible altogether on Win32.)

That said, his code is not really going into the detail needed to really check the path's validity. First off, you probably want two separate regexes, one for paths with forward slashes and one for backward. (Either type can start with a drive letter, since Windows can indeed use forward slashes; it just usually doesn't. Assuming you really do want to allow drive letters; more on that in a moment.) Also, there must be a non-separator character between each two separators (unless you have to support CIFS paths ("//node/share/path/file.ext"), in which case you want a third regex for those).

Also, while you said you don't need to check that the file exists, you probably do want to check that the directory it's supposedly in exists. That is, you may want to strip off the part after the last path separator and make sure what remains is an extant directory. If not, your documentation needs to say that it doesn't guarantee even that.

Randall brings up a good point with "..", but it seems to me from looking at your code (which allows regexes) that you probably did intend to allow it -- but the module documentation needs to be very clear about that, because allowing access upward to directories outside of the current directory is not entirely consistent with untainting untrusted data. If beside ensuring that the string is a real path you also need to ensure that it doesn't point to a file that should not be able to be accessed this way, then you have to reconsider whether you want to allow "..", absolute paths, and drive letters. If you disallow such things, it removes both drive letters and (more usefully) doubleslash network paths from your consideration, which should simplify matters if anything. You do have to check especially for ".." then, though.

sub A{while($_[0]){$d=hex chop$_[0];foreach(1..4){$_[1].=($d %2)?1:0;$d>>=1;}}return$_[1];}sub J{$_=shift;while($_){$c=0; while(s/^0//){$c++;}s/^1//;$_[1].=(' ','|','_',"\n",'\\','/' )[$c]}@_}$PH="16f6da116f6db14b4b0906c4f324";print J(A($PH));

In reply to Re: Filepath validation and untainting by jonadab
in thread Filepath validation and untainting by hardburn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.