Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Removing unwanted chars from filename.

by kcott (Archbishop)
on Oct 06, 2022 at 22:40 UTC ( [id://11147278]=note: print w/replies, xml ) Need Help??


in reply to Re: Removing unwanted chars from filename.
in thread Removing unwanted chars from filename.

G'day haukex,

"... (Update: and though tr/A-Za-z0-9._-//cd should be faster, the above module handles Unicode well, so that's why I'd still recommend that)

I wasn't aware that transliteration would have a problem with Unicode. Here's a quick test:

$ perl -Mutf8 -E '
    my $s = " abc \t ©︎ αβ гдж سشص ᚠᚢᚸ ⎈ ☂  .png";
    $s =~ tr/A-Za-z0-9._-//cd;
    say $s;
'
abc.png

I'm using Perl v5.36; are there issues with earlier versions?

I tested with a fair selection of Unicode characters but, obviously, I can't reasonably test them all. Are there problems with Unicode characters I didn't test?

— Ken

Replies are listed 'Best First'.
Re^3: Removing unwanted chars from filename.
by haukex (Archbishop) on Oct 07, 2022 at 06:02 UTC

    I was referring to the fact that the tr simply clobbers all Unicode characters, while Text::CleanFragment uses Text::Unidecode to try to turn them into ASCII:

    use warnings;
    use strict;
    use utf8;
    use Text::CleanFragment;
    
    my $s = "Hello.txt";
    print clean_fragment($s), "\n";  # prints "Hello.txt"
    $s =~ tr/A-Za-z0-9._-//cd;
    print "<$s>\n";  # prints "<>" !
    

    (I've actually encountered filenames similar to the above in the wild)

      Thanks for the clarification.

      — Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11147278]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-19 11:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found