The perl 5.10 todo wish list states that functions like
chdir, opendir, readdir, readlink, rename, rmdir e.g "could potentially accept Unicode filenames either as input or output".
Windows default encoding is UTF-16LE,but the console 'dir' command will only return ANSI names.Thus unicode characters are replaced with "?",even if you invoke the console using the unicode switch (cmd.exe /u),change the codepage to 65001 which is utf8 on windows and use lucida console true type font which supports unicode.
A workaround is to use the com facilities provided by windows (in this case Scripting.FileSystemObject) which provide a much higher level of abstraction or use the api as pointed out in this thread.
Based on your query as an initiative I tried to read a file with japanese characters in the filename which resides in the current folder and then move the file to another folder.
The filename is "は群馬県高崎市を拠点に、様々なメディ.txt"Just create a new file and copy/paste this as a filename.(I don't know what it means,I just googled for 'japanese' and this turned up!so don't flame me if it means something bad!!) and you have to have the appropriate fonts.
Since opendir ,readdir,rename etc do not support unicode you have to reside to the Scripting.FileSystemObject methods and properties which accept unicode.
This is the actual code :
use Win32::OLE qw(in);
use Devel::Peek;
#CP_UTF8 is very important as it translates between Perl strings and U
+nicode strings used by the OLE interface
Win32::OLE->Option(CP => Win32::OLE::CP_UTF8);
$obj = Win32::OLE->new('Scripting.FileSystemObject');
$folder = $obj->GetFolder(".");
$collection= $folder->{Files};
mkdir ("c:\\newfolder")||die;
foreach $value (in $collection) {
$filename= %$value->{Name};
next if ($filename !~ /.txt/);
Dump("$filename"); #check if the utf8 flag is on
$file=$obj->GetFile("$filename");
$file->Move("c:\\newfolder\\$filename");
print (Win32::OLE->LastError() || "success\n\n");
}
What puzzles me is that you say that don't see the correct filename using explorer when you should have.This will only work if you have the asian languages (regional setings)
support enabled and you should be able to see the japanase name in explorer as above
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.