in reply to Re: Contacting the author of a module?
in thread Contacting the author of a module?

I have a bunch of files with filenames that are all filled with Unicode stuff. I've always been a bit confused by how Unicode and ISO-Latin co-exist in Perl and I've really never had any need for Unicode so I've not worried about it much. The *only* thing I want to do with these files is fix their names by eliminating the Unicode stuff. I expect that the more unicode competent Perl folk will be horrified, but this is the routine I use to un-Unicode the file names:
#Remove all the unicode characters sub sanitize { my $unicode = $_[0] ; my $ascii ; for my $char (split(//, $unicode)) { $ascii .= $char if ord($char) < 256 } return $ascii; }

Replies are listed 'Best First'.
Re^3: Contacting the author of a module?
by Corion (Patriarch) on Jan 06, 2023 at 13:34 UTC
Re^3: Contacting the author of a module?
by ikegami (Patriarch) on Jan 06, 2023 at 19:04 UTC

    Your sub is equivalent to

    sub sanitize { $_[0] =~ s/[^\x00-\xFF]//gr }

    But it's wrong if you're trying to generate file names Perl can handle. cp1252 can't encode U+0080 to U+009F except U+0081, U+008D, U+008F, U+0090 and U+009D. Fixed:

    sub sanitize { decode( "cp1252", encode( "cp1252", sub{ "" } ) ) }