Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Wide characters in Windows filenames with File::Copy

by slugger415 (Monk)
on Nov 20, 2023 at 19:10 UTC ( [id://11155724]=perlquestion: print w/replies, xml ) Need Help??

slugger415 has asked for the wisdom of the Perl Monks concerning the following question:

Hello esteemed monks, I am having trouble using wide characters in filenames on Windows 10 using File::Copy. The wide characters appear just fine if I write them to a text file, and I can manually copy/paste them into filenames in Windows Explorer, but if I use File::Copy they come out all wonky.

use File::Copy; my $oldname = "file.txt"; my $newname = "Hildur_Gu­nadˇttir.txt"; copy(oldname, $newname);

This file name comes out looking like so:

Hildur_Gu├░nad├│ttir.txt

Yet if I open a file and write $newname to it, it displays just fine.

Any way to do this correctly with (or without) File::Copy ?

Thanks, Scott

Replies are listed 'Best First'.
Re: Wide characters in Windows filenames with File::Copy
by ikegami (Patriarch) on Nov 20, 2023 at 19:31 UTC

    Use Win32::LongPath's copyL instead.


    The file names passed to copy need to be strings of bytes. In unix, you'd encode the text to use as the name using the system's locale. In Windows, you'd encode it using the encoding returned by "cp".Win32::GetACP().

    The reason for this is that File::Copy's copy uses Win32's Win32::CopyFile, which exposes the CopyFileA system call. The "(A)NSI" system calls use the system's Active Code Page. The exception to this is when the program's manifest makes the programs Active Code Page 65001, UTF-8. I keep meaning to try this to change perl's Active Code Page to UTF-8. You'd still have to encode the file name, but with UTF-8 (or cp65001, the alias returned by the earlier snippet after this change).

    On an English machine, the encoding is probably cp1252. Fortunately, the file name in question can be encoded using Windows-1252. If you wanted to support file names that can't be encoded using your system's ACP, you'd have to change the program's ACP as mentioned above, or you'd have to use CopyFileW, the "(W)ide" or "Unicode" version of the system call, which takes UTF-16le strings. Win32::LongPath's copyL exposes this call. (It also munges the paths to allow longer paths, but this is transparent.) It it wasn't already exposed, you could have used a module like FFI::Platypus or Win32::API to access it, or you could have written your own XS module.

      This is one of the things that should have been changed and long forgotten about two decades ago. There is no sane reason for File::Copy to use the ACP and there hadn't been for ages.

      Jenda
      1984 was supposed to be a warning,
      not a manual!

      thank you!!

Re: Wide characters in Windows filenames with File::Copy
by BillKSmith (Monsignor) on Nov 21, 2023 at 00:43 UTC
    The following code assumes that you want to create the file name encoded in a way that you can read it with the windows dir command.
    • Create a temp 'oldfile' in current directory.
    • Copy it to a 'newfile'
    • Delete the 'oldfile'.
    • Search the directory for 'newfile'
    • Verify that the 'newfile' name is recovered
    In order to do this, it is necessary to encode the name of 'newfile' for windows before the copy. And to decode the name after ir is read back in. Run the windows dir command to verify that the file name appears as you intend.
    use strict; use warnings; use autodie; use File::Copy; use utf8; use Encode qw(encode decode); use Test::More tests=>1; use File::Temp; use Win32; my $cp = "cp".Win32::GetACP(); # Update my $oldname = tmpnam(); open my $make, '>', $oldname; print $make "Any old thing\n"; close $make; #my $newname = encode('cp1252', "Hildur_Gu­nadˇttir.txt"); my $newname = encode($cp, "Hildur_Gu­nadˇttir.txt"); copy($oldname, $newname); unlink $oldname; opendir my $dh, '.'; my $readbackname; while (1) { $readbackname = readdir $dh; die "File not found\n" if !defined($readbackname); last if $readbackname =~ m/^Hildur_Gu.nad.+ttir\.+txt/; } #$readbackname = decode('cp1252', $readbackname); $readbackname = decode($cp, $readbackname); is($readbackname, $newname, 'round trip');

    UPDATE: Corrected code per ikegami's comment Re^2: Wide characters in Windows filenames with File::Copy below.

    Bill

      Not all systems use Windows-1252 as their ACP. You should be using "cp".Win32::GetACP() (and use Win32;) instead of "cp1252".

      thank you Bill!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11155724]
Approved by ikegami
Front-paged by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-24 06:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found