Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: How to fix wrongly encoded filenames?

by mcc001 (Initiate)
on Mar 17, 2014 at 17:53 UTC ( [id://1078641]=note: print w/replies, xml ) Need Help??


in reply to Re: How to fix wrongly encoded filenames?
in thread How to fix wrongly encoded filenames?

Hi, thank for the code! :) One problem is to read the corrupted filenames off the filesystem without getting the shortened 8.3 form.

What happens is: 7zip, which is used to unpack the the TAR-archive, doesnt know the encoding scheme of the filenames and tagged them as cdp437 (while they are latin1). Windows sees the cdp437-flag and encodes the latin1-filename from cdp437 to the underlaying UTF16 (I think this is used internally). The result is a latin1-bytestring converted from cdp437 to UTF16 which results in encoding-nonsense.

The logik which I want to implement (and currently dont know how) in Perl is:
7zip is used to unpack the TAR archives. From the output of 7zip I get a list of latin1-encoded filenames while 7zip is extracting those.
Take filename by filename off the list, and if not found, it is a filename which encoding is garbled.
For those filenames, do:
Read the encoding nonsense (and *NOT* the 8.3 form of the files, windows is not able to display correctly) off the filesystem. Decode them from cdp437 and encode them to latin1. Check whether they could be found now. If so, rename the garbled filename to the corresponding filename of the list (output from 7zip).

First goal is to read the full (and garbled) filename from the filesystem.
Second goal is to change the "encoding scheme flag" of the bytestring of the filename *without* changing the bytes themselves.

I cannot identify the part of the code above, which reads the filenames off the filesystem, which definetly is a result of my being a novice and no monk...;)

How can I implement the algorithm described above?
Thank you very mauch in advance!
Best regards, mcc

  • Comment on Re^2: How to fix wrongly encoded filenames?

Replies are listed 'Best First'.
Re^3: How to fix wrongly encoded filenames?
by Anonymous Monk on Mar 18, 2014 at 07:16 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1078641]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-19 23:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found