Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
At the begining of the document it says:
Windows stores filenames in Unicode, encoded in UTF16

That's not completely right. NTFS (as most Unix/Linux file-systems) is encoding-agnostic. It just see filenames as arrays of wchar_t integers that are not required in any way to be valid UTF-16 sequences.

For most C/C++ applications that can handle wchar_t data directly this is a non issue, but for Perl it is because those file names which are not valid UTF-16 are not convertible to UTF-8 and modules like Win32::Unicode that do that conversion internally will fail on them.

Admittedly, for most scripts this is not an issue as no sane application creates (or lets the user create) files with names that are not valid UTF-16. But still malicious or just buggy software may do it.

Update: Well, NTFS is not completely encoding-agnostic because it is case-insensitive. It has the metadata file $UpCase that defines how wchar_t characters are converted to upper case.


In reply to Re^2: Seeking help for copying recursive folders having some folder/file names in Chinese or japanese by salva
in thread Seeking help for copying recursive folders having some folder/file names in Chinese or japanese by aksjain

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-19 15:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found