hardburn has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a module which (among other things) provides a series of validators on various types of user input. Each validator is a scalar holding a referance to a subroutine. The subroutine returns a list, with the first element as the untainted data, and the second element as a string (any string will do). If the validation fails, the first element is undef and the second element is an error message.

My main problem is with the filepath validator. I need to check for both *nix and Win32 filepaths. I do not need to check if the file exists or if it is among a list of valid filepaths (this is documented behavior for the module).

Current implementation, which hasn't been tested yet:

my $FILE = sub { my $file = shift; if($file =~ m!\A ([A-Za-z]:\\)? # Optional DOS drive (such as 'C:\') ( [/\\]*? # Allow either '/' or '\' as a directory seperator + [-\.\w\s]+? # Allow certain characters as the filename )+ \z!x) { return ($1 . $2, "Passed"); } else { return (undef, "$file is not a valid filepath"); } };

Am I doing anything dangerously naive? Are there modules already available to do this? There are probably a lot more valid chars in many filesystems than what I'm checking above. What are some generally good special chars to check for?

Replies are listed 'Best First'.
•Re: Filepath validation and untainting
by merlyn (Sage) on Feb 12, 2003 at 21:24 UTC
    Two problems:
    1. You permit "..". Nasty.
    2. Handled properly, there are no "bad" filenames syntactically. Perhaps they point outside an area of your interest, or create a file with an extension that is significant, but that's a semantic issue that your module can never fully understand, unless the checking behavior is passed in.

    I suggest you give up your endeavor as "futile". There's nothing "generic" to contribute. All the rules will be application-specific, always.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Wow. Randal Schwartz just labelled this endeavour futile...if I were you, that would be enough for me to go to management for a new set of specs. ;)

        Well maybe the specs should change. If you're testing for valid paths, but unable or unwilling to determine the originator OS, or whether the file exists, why validate at all except to remove things which 'break' your code. Management might as well ask you for a module that remotely checks that the user has wiped his arse and washed hands before entering 'tainted' data.


        I can't believe it's not psellchecked
Re: Filepath validation and untainting
by fruiture (Curate) on Feb 12, 2003 at 16:54 UTC

    I don't know if there's another module to do that. I don't think it makes much sense, because by using a filetest-operator like -e I can without any danger ensure a file exists (is a file,a directory) and that includes that the path that is tested is a valid path for whatever OS I'm on, otherwise it could not point to a file.

    Your regexp does in fact only check for character occurences, because everything is marked as optional via *,? (the ()+ expression only contains optional ones), so you can have '//////','XXXXXXXXXX','///ABC//DEF//', which can all be valid as path in the end. So you're better off by just using tr{/\\a-zA-Z0-9.-}{}c in order to validate no unwanted characters are found, that hase the same effect at much less work for the computer. And this singel limitation is alreade inaccurate, because a good OS allows more than these characters in filenames.

    --
    http://fruiture.de

      . . . by using a filetest-operator like -e I can without any danger ensure a file exists (is a file,a directory) and that includes that the path that is tested is a valid path for whatever OS I'm on, otherwise it could not point to a file.

      I'd like to check for files that may not exist yet, or might be on a completely different OS. -e just won't cut it.

      So you're better off by just using tr{/\\a-zA-Z0-9.-}{}c in order to validate no unwanted characters are found, that hase the same effect at much less work for the computer.

      The data being returned has to be untainted. tr/// won't do that.

      ----
      Invent a rounder wheel.

        So, what's actually an syntactically impossible path? Can you say much more than what a tr/// check says? (Untainting can be done with a fake-regexp like /(.*)/, you don't need something complicated for that.) What can you really exclude? Not much Apart from /\.\.\.+/, or?

        Bit still I can't see the sense of that, why do you want to check whether a filename is syntactically possible. It doesn't give you any hints about the actual possibility of creating or finding such a file...

        --
        http://fruiture.de