Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Rename Windows files with Unicode chars

by mnooning (Beadle)
on Aug 27, 2016 at 01:32 UTC ( #1170561=perlquestion: print w/replies, xml ) Need Help??

mnooning has asked for the wisdom of the Perl Monks concerning the following question:

I get files in Windows 7 that have wide chars. I need to rename them after stripping the non-ascii characters from them. Opendir and then readdir (in a while clause) does not work because readdir does bytes!

Below is code that I also tried.


# File name = a1.pl # Script to get rid of wide chars and non-ascii chars # in a Windows 7 file name. # This does not work. # In the Windows 7 file explorer, the file name shows as # "z &#8206;ay&#8206; &#8206;Pow&#8206;.mp4" # Note: There is at least one embedded wide char in the above # pasted file name. # The Windows 7 command promt shows the file name # as # "z ?ay? ?Pow?.mp4". use 5.14.2; # From: how to read unicode filename # http://www.perlmonks.org/?node_id=536223 open fList, '-|:encoding(UTF-16LE)', 'cmd /U /C dir /W'; # Note: I tried to opendir and readdir. I got the shortened # 8.3 character file name whenever a wide character # was in the file name. I could not rename. foreach (<fList>) { utf8::encode($_); my $orig_name = $_; my $new_name = $_; if ($new_name =~ m/.mp4/i) { print " 1 orig_name is \"$orig_name\"\n"; $new_name =~ s![^[:ascii:]]!!ig; print " 2 new name is \"$new_name\"\n"; rename "$orig_name", "$new_name"; # Does not work } } __END__ In the results below, note that the end double quotes are not at the end of the file name line! That should not be! >a1.pl 1 orig_name is "z &#915;ay&#915; &#915;Pow&#915;.mp4 " 2 new name is "z ay Pow.mp4 "

Replies are listed 'Best First'.
Re: Rename Windows files with Unicode chars
by beech (Parson) on Aug 27, 2016 at 01:58 UTC
    For unicode filenames on windows you need to use Win32::Unicode

    update: Example

    #!/usr/bin/perl -- use strict; use warnings; use Win32::Unicode -native; listDir(); open my($fh), '>:encoding(UTF-8)', qq{I-\x{2665}-Perl} or die $!; print $fh qq{I-\x{2665}-Perl}; close $fh; listDir(); rename qq{I-\x{2665}-Perl}, 'I-love-Perl'; listDir(); unlink 'I-love-Perl'; sub listDir { my( $dir ) = grep defined, @_, '.'; my $wdir = Win32::Unicode::Dir->new( ); $wdir->open($dir) or die $!; for ($wdir->fetch) { next if /^\.{1,2}$/; my $full_path = "$dir/$_"; if (file_type('f', $full_path)) { print "f $_\n"; } elsif (file_type('d', $full_path)){ print "d $_\n"; } } $wdir->close or die $!; print "\n####\n\n"; } __END__
    $ chcp 65001
    Active code page: 65001
    
    $ perl win32-unicode-native-to-ascii.pl
    f win32-unicode-native-to-ascii.pl
    
    ####
    
    f I-♥-Perl
    f win32-unicode-native-to-ascii.pl
    
    ####
    
    f I-love-Perl
    f win32-unicode-native-to-ascii.pl
    
    ####
    
    
    $

      The line

      rename qq{I-\x{2665}-Perl}, 'I-love-Perl';

      is cheating. You were able to type in the name of the file that was to be renamed because you already knew what the original file name was. The problem is that I will never have such knowledge beforehand.

      Can you show code that will read a file name in the given directory, complete with the file's non-ascii characters, save the original file name in some variable, then strip the non-ascii from the said file name, then rename the file (using the original, saved file name) to the new all-ascii file name?

        Hi,

        Did you forget about  sub listDir { ? How does listDir cheat at reading unicode filenames?

        :)

      Another option is with the COM interface,more here:
      Unicode issues in Perl (from a windows perspective)
      www.i-programmer.info/programming/other-languages/1973-unicode-issues-in-perl.html
        thanks for the mention. The article goes to lengths in describing the underlying encoding issues and how to deal with them,but for the OP's purpose the following code snippet extracted from the article should do it
        use Win32::Console; Win32::Console::OutputCP( 65001 ); use Devel::Peek; use Win32::OLE qw(in); binmode(STDOUT, ":utf8"); Win32::OLE->Option(CP => Win32::OLE::CP_UTF8); $obj = Win32::OLE-> new('Scripting.FileSystemObject'); $folder = $obj->GetFolder("."); $collection= $folder->{Files}; foreach $value (in $collection) { $filename= %$value->{Name}; next if ($filename !~ /.rar/i); print $filename,"\n"; Dump $filename,"\n"; }
        I haven't benchmarked it but logically the Win32 API calls should be faster than calling into the COM,but nevertheless COM exposes FileSystemObjects methods which might be convenient anyway
      Win32::Unicode does not compile under 5.24 (mswin32), so I cannot use that code.

        Hi,

        Win32::Unicode does not compile under 5.24 (mswin32),

        Are you sure about that?

        Looking at CPAN Testers Reports: Report for Win32-Unicode-0.38 all I see is a failing test -- meaning the module did compile and can be installed

        so I cannot use that code.

        Maybe you can use an older version of perl ?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1170561]
Approved by beech
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (3)
As of 2022-05-17 19:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (68 votes). Check out past polls.

    Notices?