in reply to Re: Removing unwanted chars from filename.
in thread Removing unwanted chars from filename.

G'day haukex,

"... (Update: and though tr/A-Za-z0-9._-//cd should be faster, the above module handles Unicode well, so that's why I'd still recommend that)

I wasn't aware that transliteration would have a problem with Unicode. Here's a quick test:

$ perl -Mutf8 -E '
    my $s = " abc \t ©︎ αβ гдж سشص ᚠᚢᚸ ⎈ ☂  .png";
    $s =~ tr/A-Za-z0-9._-//cd;
    say $s;
'
abc.png

I'm using Perl v5.36; are there issues with earlier versions?

I tested with a fair selection of Unicode characters but, obviously, I can't reasonably test them all. Are there problems with Unicode characters I didn't test?

— Ken

Replies are listed 'Best First'.
Re^3: Removing unwanted chars from filename.
by haukex (Archbishop) on Oct 07, 2022 at 06:02 UTC

    I was referring to the fact that the tr simply clobbers all Unicode characters, while Text::CleanFragment uses Text::Unidecode to try to turn them into ASCII:

    use warnings;
    use strict;
    use utf8;
    use Text::CleanFragment;
    
    my $s = "Hello.txt";
    print clean_fragment($s), "\n";  # prints "Hello.txt"
    $s =~ tr/A-Za-z0-9._-//cd;
    print "<$s>\n";  # prints "<>" !
    

    (I've actually encountered filenames similar to the above in the wild)

      Thanks for the clarification.

      — Ken