in reply to Re: Re: Untainting 'bad' filenames
in thread Untainting 'bad' filenames

The program knows the filename, but I can't anticipate all permutations in advance

I'm not sure about the taint feature in this respect, so I'll refrain from commenting further on that issue. But with the filename, when I say you do know the filename, it is in relation to the script. IOW, your spec seems to state that you do know the format of the filename, but not the filename.

But in order to compare the filename to a regex, you (the script is simply an extension of you; be the script :) have to know the filename. The regex shouldn't check all permutations of the name. It should check valid permutations.

In which case, you can write a very tight regex, since it is based on your valid filename. I think you're taking too many variables into account here with the solution of your problem. I see a single variable: the filename, and a single control: the format the filename should match. This makes it a very binary operation. It matches or it doesn't. What is it that I'm missing in this discussion? (This is purely discussion, since it seems as though someone may have provided a solution that you will use.) I'm interested in case I ever see this problem myself.

ALL HAIL BRAK!!!

Replies are listed 'Best First'.
Re: Re: Re: Re: Untainting 'bad' filenames
by chipmunk (Parson) on Dec 08, 2000 at 22:42 UTC
    Psychospunk, what you're missing is that doran is asking how to deal with bad filenames, that is, filenames that don't fit the specified format. How should he untaint those filenames, which are in an unknown format, so that he can safely pass the filenames to the rename() function?
      Oh, but I was under the impression that he was simply moving the file if it didn't match and technically not renaming it. But, I do see how inspecting the file would be made more difficult if you don't know the format.

      Wouldn't it be easily accomplished (the rename()) if he stripped any characters from the filename that would cause issues? I guess what I'm failing to see is what happens to bad filenames after they're moved. I'm looking at the problem as if we have good filenames and bad filenames. If it's not good, I need to move the file elsewhere. Thus, I need to know if the filename has any characters that would cause the rename function to explode. After that, I would simply use another script to inspect the internals of the files considered bad.

      The environment seems to be controlled, in the sense that both directories are only accessible to "trusted users". I may be wrong about that. But if that's the case, then what is the difference between inspection before moving and inspection afterwards. I'm battling this out since I want to know why the previously suggested way of checking the file is better than this idea of having a second script check the bad files.

      ALL HAIL BRAK!!!

        Renaming a file is basically the same as moving it. In Perl they're both done with the rename() function; in the Unix shell, with `mv`.

        If he strips any problematic characters from the filename, then the rename() function would be trying to rename a file that doesn't exist, so that won't work.

        Personally, I think the appropriate solution here is just to use /^(.*)$/s to untaint the filename. That would be bad if the input were coming from a user; but in this case it's coming from the file system. If the input were coming from the user, it might specify a file that he doesn't want to move; but since the input is coming from the filesystem, and he knows he wants to move the file no matter what odd characters the name contains, using that regex to untaint makes sense in this case.