merrymonk has asked for the wisdom of the Perl Monks concerning the following question:

I want to make sure that the ‘name’ given for part of a file or directory name does not have any of the forbidden characters.
The Perl code below does that.
However, it would be ‘nicer’ if the loop that saw if each character in the given ‘name’ used a regular expression.
I have tried to implement this but failed. I think that one of my problems is that at least some of the characters that cannot be used have a special meaning in a regular expression and therefore have to be used with some care.
Therefore it would be good if any Monk could let me know what this regular expression should be.
use strict; # my ($given_file_str, $message_ret, $return_code); # #====================================================== # # test_nonvalid_filechr # # this test the given variable to find if there are any none valid fil +e name characters # # arguments # # 1 $given_file_str gien file/directory name string # 2 $ref_message_ret referecne to message given when failure # 3 $ref_return returned code # 1 ok # 0 fialure # #====================================================== sub test_nonvalid_filechr($$$) { my ($given_file_str, $ref_message_ret, $ref_return) =@_; my (@bdchr, $bdchr_item, $j, $given_total); @bdchr = ('\\', '/', ':', '*', '?', '"', '<', '>', '|'); #" $$ref_message_ret = ''; $$ref_return = 1; $given_total = length($given_file_str); foreach $bdchr_item (@bdchr) { for($j = 0; $j < $given_total; $j ++) { if(substr($given_file_str, $j, 1) eq $bdchr_item) { $$ref_message_ret .= $bdchr_item . ' '; $$ref_return = 0; } } } } # test data $given_file_str = 'a b c *'; test_nonvalid_filechr ($given_file_str, \$message_ret, \$return_code); print "\nfile name <$given_file_str> message <$message_ret> code <$ret +urn_code>\n"; $given_file_str = '\ / : * ? " < > |'; test_nonvalid_filechr ($given_file_str, \$message_ret, \$return_code); print "\nfile name <$given_file_str> message <$message_ret> code <$ret +urn_code>\n";

Replies are listed 'Best First'.
Re: Regular expression to check for invalid file name characters
by JavaFan (Canon) on Feb 22, 2010 at 11:11 UTC
    sub is_valid_name {$_[0] !~ m{[\\/:*?"<>|]}}
    Note however that this allows an empty name (which isn't valid in most filesystems). Also note that it's filesystem dependent what are valid and invalid character. For instance, most UNIX filesystems allow any character, except "\x{00}" and "/", the former being a C string terminator convention, the latter being the path separator.

      Maybe a character class of what is allowed would be a better approach, users can be very inventive :)

      sub is_valid_name { $_[0] =~ m{[a-zA-Z0-9_ ]} }

      Update: OOps, still learning ..., a negated character class.

      sub is_valid_name { $_[0] !~ m{[^a-zA-Z0-9_ ]} }
      Update2: JavaFan is right, this is not the right way to do it.

      Update3: ... and I need my salary and don't want to upset the rest of the world ;)

        So, no Unicode or accented characters or even dots or hyphens? Your fellow Chinese, Korean, French, Swedish, Indian and Arabic Perlmonks will not be happy with that. Neither will your fellow Perl or C coders. (Nope, can't have Module.pm. Nor main.c. salary_run_2010-02? No payment for you!)

        Sometimes, it's just easier to list what you want to exclude.

Re: Regular expression to check for invalid file name characters
by cdarke (Prior) on Feb 22, 2010 at 11:33 UTC
    You might like to take a look at the regular expressions in the code for File::Basename (it's commented and everything). On Microsoft Windows the file will probably be in C:\Perl\lib\File\Basename.pm.
Re: Regular expression to check for invalid file name characters
by ikegami (Patriarch) on Feb 22, 2010 at 16:03 UTC

    Why not just let the OS tell you if it's invalid? There's a bug that \0 will be treated as the end of string, so that's the only one you actually have to check for.

    Now, if you're doing this for security reasons, a much better approach is to check for the characters you want to allow (as opposed to those you want to disallow). That way, you don't actually miss any.

      Why not just let the OS tell you if it's invalid?

      How would you do that reliably?

      If you tried creating a file, it might fail because it already existed. If it succeeds, then you'd have to delete it again--assuming the idea is to just check, not actually use. And if it contained a backslash, you might end up creating a file in a subdirectory. You can't stat unless it already exists.

      So what call would you use?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        If it succeeds, then you'd have to delete it again--assuming the idea is to just check, not actually use.

        I assumed the opposite. When we get questions like this, the check is followed by actually opening or creating the file.

        How would you do that reliably?

        Errno provides codes against which you can check $!.

Re: Regular expression to check for invalid file name characters
by Anonymous Monk on Feb 22, 2010 at 10:45 UTC
Re: Regular expression to check for invalid file name characters
by edeca (Acolyte) on Feb 23, 2010 at 10:54 UTC
    You would be much better using the valid_filename sub from File::Util. Check the manual for it.

    File::Util also has escape_filename, to remove/escape bad characters from a path.

    Both of these are better than writing your own.