sstevens has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise monks! I have users uploading files on my server, and occasionally they use a filename that the server doesn't like (such as chapter#2.doc). I want to get rid of all non-word characters in the files they upload, but I want to keep the \. for the extension. Right now I have this:
# get the file from the form and call it $file $file =~ s/(.*)\.(.*)$/$1/; $ext = $2; $file =~ s/\W//g; $file = "$fileName\.$ext";
This works fine, but I don't like it. Is there a way to say "match all non-word characters except \."?

Replies are listed 'Best First'.
Re: Substitute \W but not \.
by merlyn (Sage) on Mar 12, 2008 at 20:49 UTC
      Are you sure? perlunicode says
      (However, and as a limitation of the current implementation, using "\w" or "\W" inside a "[...]" character class will still match with byte semantics.)
      which means that [^\w] is not the same as \W.
        Well, logically, those are arguably the same (at least, have been since day two), so I'd call the current implementation "buggy" if it can't make those the same.

        But thanks, I was unaware of this bug.

Re: Substitute \W but not \.
by kyle (Abbot) on Mar 12, 2008 at 20:46 UTC

    Well, \w = [a-zA-Z0-9_], so \W = [^a-zA-Z0-9_]. If you want to take the dot out of there too, make it [^a-zA-Z0-9_.].

    $file =~ s/[^a-zA-Z0-9_.]//g;

    Or use tr:

    $file =~ tr/a-zA-Z0-9_.//cd;
      This works well for English, but may fail under other locales. Using the sets that honor locales is more portable across data sets.
Re: Substitute \W but not \.
by locked_user sundialsvc4 (Abbot) on Mar 12, 2008 at 20:50 UTC

    The tr// (“transliteration”) operator might come in useful here. Particularly with the “c” modifier which complements (inverts) the search-list. You can use this modifier in conjunction with a list of all the characters you will accept, to transform all occurrences of characters not in this list into an empty-string.

    See: perldoc perlop, which includes some fairly specific examples.

    There are also specific File packages in CPAN which handle such things as “clean up this filename in an appropriate way for this system whatever-it-is,” which might be what you really want to achieve. It bears remembering, always at the front of your mind, that no matter what you are trying to accomplish (... to accomplish ... not to do ...), somebody on CPAN has probably already been there. If that's the case, you can probably just “hitch a ride on their broom.”

A reply falls below the community's threshold of quality. You may see it by logging in.