Re^2: Contacting the author of a module?

I have a bunch of files with filenames that are all filled with Unicode stuff. I've always been a bit confused by how Unicode and ISO-Latin co-exist in Perl and I've really never had any need for Unicode so I've not worried about it much. The *only* thing I want to do with these files is fix their names by eliminating the Unicode stuff. I expect that the more unicode competent Perl folk will be horrified, but this is the routine I use to un-Unicode the file names:

#Remove all the unicode characters
sub sanitize
{   my $unicode = $_[0] ;
    my $ascii ;
    for my $char (split(//, $unicode)) 
    {   $ascii .= $char if ord($char) < 256  }
    return $ascii; 
}
[download]

Comment on Re^2: Contacting the author of a module? Download Code

Replies are listed 'Best First'.
Re^3: Contacting the author of a module? by Corion (Patriarch) on Jan 06, 2023 at 13:34 UTC
See also Text::CleanFragment, which downgrades `ö` to `o` for example.	[reply] [d/l] [select]
Re^3: Contacting the author of a module? by ikegami (Patriarch) on Jan 06, 2023 at 19:04 UTC
Your sub is equivalent to `sub sanitize { $_[0] =~ s/[^\x00-\xFF]//gr }` [download] But it's wrong if you're trying to generate file names Perl can handle. cp1252 can't encode U+0080 to U+009F except U+0081, U+008D, U+008F, U+0090 and U+009D. Fixed: `sub sanitize { decode( "cp1252", encode( "cp1252", sub{ "" } ) ) }` [download]	[reply] [d/l] [select]