Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Seeking help for copying recursive folders having some folder/file names in Chinese or japanese

by aksjain (Acolyte)
on Jan 19, 2015 at 11:23 UTC ( #1113737=perlquestion: print w/replies, xml ) Need Help??

aksjain has asked for the wisdom of the Perl Monks concerning the following question:

As part of my first perl script project, i need to copy some folders (having folder or files named in Chinese or Japanese characters) from one directory to another. I tried using dircopy from "File::Copy::Recursive" but the copied directory have the mangled names like:

Orig. Name => Copied Dir name

テーブル => A95B~1

Can somebody please help me to get out of this problem? I am using the very simple code-line like:

my $num_of_files_and_dirs = dircopy($source_dir,$target_dir) or die "C +opying Server failed: $!";
  • Comment on Seeking help for copying recursive folders having some folder/file names in Chinese or japanese
  • Download Code

Replies are listed 'Best First'.
Re: Seeking help for copying recursive folders having some folder/file names in Chinese or japanese
by BrowserUk (Patriarch) on Jan 19, 2015 at 11:33 UTC

    Seeing the short-pathname A95B~1 produced in your example, I assume you are on windows. In which case try Win32::Unicode::cptreeW($from, $to)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

      thanks for your response. Win32::Unicode::cptreeW works perfect in terms of copying the folder, but it do have an optimization for not copying the empty folders.

      Is there a way to get rid of this optimization? I just need to have the exact same directory structure at destination as it is on source location.

        but it do have an optimization for not copying the empty folders.

        It's not an optimisation as such; rather, just a side-effect of processing the tree the 'unix way'. Directories only get created if they do not exist when a file is to be copied into them.

        There is no architected API mechanism for changing this behaviour; nor can I see any easy way it could be retro-fitted or monkey patched to do so.

        My alternative would be to simply shell out to xcopy: system qq[ xcopy /E/I/C/O srcpath\\* destpath\\ ];.

        (You also might want to consider /G /H /R & /U. (See the help.))

        As it's supplied with the OS, it will be able to copy any files or directories that you've already created. It is also significantly quicker on deep trees.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re: Seeking help for copying recursive folders having some folder/file names in Chinese or japanese
by Anonymous Monk on Jan 19, 2015 at 12:15 UTC
      At the begining of the document it says:
      Windows stores filenames in Unicode, encoded in UTF16

      That's not completely right. NTFS (as most Unix/Linux file-systems) is encoding-agnostic. It just see filenames as arrays of wchar_t integers that are not required in any way to be valid UTF-16 sequences.

      For most C/C++ applications that can handle wchar_t data directly this is a non issue, but for Perl it is because those file names which are not valid UTF-16 are not convertible to UTF-8 and modules like Win32::Unicode that do that conversion internally will fail on them.

      Admittedly, for most scripts this is not an issue as no sane application creates (or lets the user create) files with names that are not valid UTF-16. But still malicious or just buggy software may do it.

      Update: Well, NTFS is not completely encoding-agnostic because it is case-insensitive. It has the metadata file $UpCase that defines how wchar_t characters are converted to upper case.

      why this happens and what to do

      Win32::Unicode is much more convenient than Win32::OLE

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1113737]
Approved by marto
Front-paged by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2022-05-18 16:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (71 votes). Check out past polls.

    Notices?