Hi, thank for the code! :) One problem is to read the corrupted filenames off the filesystem without getting the shortened 8.3 form.

What happens is: 7zip, which is used to unpack the the TAR-archive, doesnt know the encoding scheme of the filenames and tagged them as cdp437 (while they are latin1). Windows sees the cdp437-flag and encodes the latin1-filename from cdp437 to the underlaying UTF16 (I think this is used internally). The result is a latin1-bytestring converted from cdp437 to UTF16 which results in encoding-nonsense.

The logik which I want to implement (and currently dont know how) in Perl is:
7zip is used to unpack the TAR archives. From the output of 7zip I get a list of latin1-encoded filenames while 7zip is extracting those.
Take filename by filename off the list, and if not found, it is a filename which encoding is garbled.
For those filenames, do:
Read the encoding nonsense (and *NOT* the 8.3 form of the files, windows is not able to display correctly) off the filesystem. Decode them from cdp437 and encode them to latin1. Check whether they could be found now. If so, rename the garbled filename to the corresponding filename of the list (output from 7zip).

First goal is to read the full (and garbled) filename from the filesystem.
Second goal is to change the "encoding scheme flag" of the bytestring of the filename *without* changing the bytes themselves.

I cannot identify the part of the code above, which reads the filenames off the filesystem, which definetly is a result of my being a novice and no monk...;)

How can I implement the algorithm described above?
Thank you very mauch in advance!
Best regards, mcc


In reply to Re^2: How to fix wrongly encoded filenames? by mcc001
in thread How to fix wrongly encoded filenames? by mcc001

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.