Plankton has asked for the wisdom of the Perl Monks concerning the following question:

Friends,

I am trying to write a script that will take a directory as an argument and look to files that have unwanted characters in their names. For example the scrirpt would work like this ...
bash-2.03$ ls -l DIR/ total 4 -rw------- 1 pk ton 180 Dec 1 12:45 junk -rw------- 1 pk ton 20 Dec 1 12:45 rm -f . bash-2.03$ bash-2.03$ ./testFNI.pl DIR/ ./testFNI.pl : ILLEGAL FILE NAME in DIR [rm -f .]
Here is what I have tried so far ...
#!/usr/local/bin/perl -w use strict; use File::Copy; use File::Basename; sub illegalFile { my $dir=shift; #look for dangerous file name #for my $FN ( glob( '$dir/*"[ |\~|\?|\<|\>|\,|\`|\!|\@|\#|\%|\^|\&|\* +|\||\(|\)|\;|\+|\=|\{|\}|\\|\[|\]|\-]"*' )) { #for my $FN ( glob( "$dir/*[ \~\?\<\>\,\`\!\@\#\%\^\&\*\|\(\)\;\+\=\{ +\}\\\[\]\-]*" )) { my $pat = $dir . "/" . quotemeta ("*[ ~?<>,`!@#%^&*|();+={}\[]-"); for my $FN ( glob $pat ) { print "$0 : ILLEGAL FILE NAME in $dir [$FN]\n"; my $PATH_HAVING_ILLEGAL_FILE=dirname $dir; my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isds +t) = localtime(time); my $DIR_HAVING_ILLEGAL_FILE="$PATH_HAVING_ILLEGAL_FILE +/ILLEGAL_FILE_NAME_" . $year . "_" . $mon . "_" . $mday . "_" . $hour . "_" . $min . "_" . $sec; mkdir $DIR_HAVING_ILLEGAL_FILE; move( $dir, $DIR_HAVING_ILLEGAL_FILE ); mkdir $dir; mailError ("illegal file name detected!"); exit 1; } ### END look for dangerous file name } my $dir = shift; if ( !$dir ) { die "usage : $0 <dir>\n"; } illegalFile ($dir);
... this code doesn't find the bad file name. Other things I have tried think every file is bad!


Plankton: 1% Evil, 99% Hot Gas.

Replies are listed 'Best First'.
Re: using glob to find "unwanted" file names
by Zaxo (Archbishop) on Dec 01, 2003 at 21:28 UTC

    You're trying to use perl regexen as glob patterns. That doesn't work, as you found. Write those as csh-style shell globs. For instance to pick out names with punctuation, my @badnames = glob( $dir . '/*{!,@,#,$,%,^,&}*'); You may have to play around with the patterns, but you don't need quotemeta. File::Glob takes care of globbing no matter what the platform and the shell is not called.

    After Compline,
    Zaxo

      my @badnames = glob( $dir . '/*{!,@,#,$,%,^,&}*'­);

      Can't that be simplified to:

      my @badnames = glob( $dir . '/*[!@#$%^&]*'­ );
      It works for me1.

                      - tye

      1 Though not tested with punctuation characters as I didn't have any files with punctuation in their names for use in my quick test and the nearest operating system sanely refuses to put insane characters into file names.

        Yes, that's certainly better when testing for single characters.

        After Compline,
        Zaxo

      Ahhh that's much better! Thanks!

      Plankton: 1% Evil, 99% Hot Gas.
      Hi Zaxo,
      Again thanks ... maybe you can answer this question too?
      I have this now ...
      my @badnames = glob( $dir . '/*{!,@,#,$,%,^,&,\ ,(,),[,|,<,>,;,+,=,`,~ +}*');
      This works fine, but when I try and add ] or } or \ everything matches. I have tried \] \} and \\. The funny thing is that [ { work just fine.

      Plankton: 1% Evil, 99% Hot Gas.
•Re: using glob to find "unwanted" file names
by merlyn (Sage) on Dec 01, 2003 at 21:17 UTC
    There's no such thing as an illegal character in a filename that's already in a filename. The operating system would have prevented it.

    Perhaps you underestimate the ability of standard Unix tools to deal with such odd names. Perhaps you should code your own utilities to take the same care with such names. That'd be a better long-term strategy, instead of trying to find filenames that you find objectionable for no technical reason.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      I didn't mean illegal from an OS point of view. I am trying to defeat "illegal" cross-site scripting. I thought that would of been obvious, I guess I was wrong. Thanks for all your help.

      Plankton: 1% Evil, 99% Hot Gas.
Re: using glob to find "unwanted" file names
by Roy Johnson (Monsignor) on Dec 01, 2003 at 22:44 UTC
    You need a trailing star in your glob pattern (it currently appears to have a trailing dash). Also, excluding all files with dashes is a bit of overkill. You can eliminate space-dashes by eliminating spaces:
    for my $FN ( glob "$dir/".'*[\ \\?<>,`!@#$%^&*();+={}[]*' ) {
    I can eliminate opening brackets, but when I try to escape a closing bracket, suddenly nothing matches.

    Note that you could get the same effect with

    for my $FN ( grep /[ \\?<>,`!@#$%^&*();+={}[\]]/, <*>)
    Season to taste.

    The PerlMonk tr/// Advocate
Re: using glob to find "unwanted" file names
by Jenda (Abbot) on Dec 01, 2003 at 23:36 UTC

    I'd rather use opendir/readdir/closedir. That's more likely to be portable.

    But if the files are on the disk it's already too late. You should make sure you do not create such files in the first place.

    Jenda
    Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
       -- Rick Osborne

    Edit by castaway: Closed small tag in signature

Re: using glob to find "unwanted" file names
by Aristotle (Chancellor) on Dec 03, 2003 at 07:30 UTC
    I don't get it. Why is rm -f . to be considered an illegal filename? Are you trying to execute filenames in a shell? If so, how about not doing that, then?

    Makeshifts last the longest.