Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Removing unwanted chars from filename.

by haukex (Archbishop)
on Oct 06, 2022 at 17:39 UTC ( [id://11147276]=note: print w/replies, xml ) Need Help??


in reply to Removing unwanted chars from filename.

I would strongly recommend Corion's Text::CleanFragment.

As for your regex, note that [:ascii:] is defined as "Any character in the ASCII character set", and the string you've shown here is entirely ASCII, so your code is "working". Perhaps you meant s/[^[:alnum:]]//g or e.g. s/[^[:alnum:]._-]//g instead? (Update: and though tr/A-Za-z0-9._-//cd should be faster, the above module handles Unicode well, so that's why I'd still recommend that)

Replies are listed 'Best First'.
Re^2: Removing unwanted chars from filename.
by kcott (Archbishop) on Oct 06, 2022 at 22:40 UTC

    G'day haukex,

    "... (Update: and though tr/A-Za-z0-9._-//cd should be faster, the above module handles Unicode well, so that's why I'd still recommend that)

    I wasn't aware that transliteration would have a problem with Unicode. Here's a quick test:

    $ perl -Mutf8 -E '
        my $s = " abc \t ©︎ αβ гдж سشص ᚠᚢᚸ ⎈ ☂  .png";
        $s =~ tr/A-Za-z0-9._-//cd;
        say $s;
    '
    abc.png
    

    I'm using Perl v5.36; are there issues with earlier versions?

    I tested with a fair selection of Unicode characters but, obviously, I can't reasonably test them all. Are there problems with Unicode characters I didn't test?

    — Ken

      I was referring to the fact that the tr simply clobbers all Unicode characters, while Text::CleanFragment uses Text::Unidecode to try to turn them into ASCII:

      use warnings;
      use strict;
      use utf8;
      use Text::CleanFragment;
      
      my $s = "Hello.txt";
      print clean_fragment($s), "\n";  # prints "Hello.txt"
      $s =~ tr/A-Za-z0-9._-//cd;
      print "<$s>\n";  # prints "<>" !
      

      (I've actually encountered filenames similar to the above in the wild)

        Thanks for the clarification.

        — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11147276]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-23 06:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found