comment on

Hi, thank for the code! :) One problem is to read the corrupted filenames off the filesystem without getting the shortened 8.3 form.

What happens is: 7zip, which is used to unpack the the TAR-archive, doesnt know the encoding scheme of the filenames and tagged them as cdp437 (while they are latin1). Windows sees the cdp437-flag and encodes the latin1-filename from cdp437 to the underlaying UTF16 (I think this is used internally). The result is a latin1-bytestring converted from cdp437 to UTF16 which results in encoding-nonsense.

The logik which I want to implement (and currently dont know how) in Perl is:
7zip is used to unpack the TAR archives. From the output of 7zip I get a list of latin1-encoded filenames while 7zip is extracting those.
Take filename by filename off the list, and if not found, it is a filename which encoding is garbled.
For those filenames, do:
Read the encoding nonsense (and *NOT* the 8.3 form of the files, windows is not able to display correctly) off the filesystem. Decode them from cdp437 and encode them to latin1. Check whether they could be found now. If so, rename the garbled filename to the corresponding filename of the list (output from 7zip).

First goal is to read the full (and garbled) filename from the filesystem.
Second goal is to change the "encoding scheme flag" of the bytestring of the filename *without* changing the bytes themselves.

I cannot identify the part of the code above, which reads the filenames off the filesystem, which definetly is a result of my being a novice and no monk...;)

How can I implement the algorithm described above?
Thank you very mauch in advance!
Best regards, mcc

In reply to Re^2: How to fix wrongly encoded filenames? by mcc001
in thread How to fix wrongly encoded filenames? by mcc001

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


laziness, impatience, and hubris
	PerlMonks