in reply to One true regexp for untainting windows filenames?

Strictly speaking it is the filesystem which defines which characters are legal, not the operating system. This means that a drive shared between *nix and Windows (using, say, Samba) can have an interesting effect on file naming. Also remember that NTFS filenames can contain Unicode characters.

Take a look at the .pm for File::Basename (which should be in your base release).
  • Comment on Re: One true regexp for untainting windows filenames?

Replies are listed 'Best First'.
Re^2: One true regexp for untainting windows filenames?
by jaldhar (Vicar) on Jan 08, 2009 at 23:42 UTC

    Thanks for the tip. I found slightly more understandable code in File::Spec which has resulted in the following regexps: for Unix...

    qr{(\A (?: .* / (?: \.\.?\z )? )? [^/]* )}msx;
    ...and Windows (includes UNC paths)...
    qr{(\A (?: [a-zA-Z]: | (?:\\\\\\\\|//)[^\\\\/]+[\\\\/][^\\\\/]+ )? (?:.*[\\/](?:\.\.?\Z(?!\n))?)? .* )}msx;

    --
    જલધર

      There is no a string that

      qr{(\A (?: .* / (?: \.\.?\z )? )? [^/]* )}msx

      won't match.

      It's wrong for two reasons.

      • "foo" gets "untainted" as "".
      • "x/xx\0xx"" is believed to be a valid file name, but it isn't.

      Valid unix paths and only valid unix paths match

      qr{^([\0]+)\z}

      (Although that doesn't mean there can ever be a file referenced by that path.)