Hi,
thank for the code! :)
One problem is to read the corrupted filenames off the filesystem without getting the shortened 8.3 form.
What happens is:
7zip, which is used to unpack the the TAR-archive, doesnt
know the encoding scheme of the filenames and tagged them
as cdp437 (while they are latin1). Windows sees the cdp437-flag and encodes the latin1-filename from cdp437 to the underlaying UTF16 (I think this is used internally). The result is a latin1-bytestring converted from cdp437 to UTF16 which results in encoding-nonsense.
The logik which I want to implement (and currently dont
know how) in Perl is:
7zip is used to unpack the TAR archives. From the output of 7zip I get a list of latin1-encoded filenames while 7zip is extracting those.
Take filename by filename off the list, and if not found, it is a filename which encoding is garbled.
For those filenames, do:
Read the encoding nonsense (and *NOT* the 8.3 form of the files, windows is not able to display correctly) off the filesystem. Decode them from cdp437 and encode them to latin1. Check whether they could be found now. If so,
rename the garbled filename to the corresponding filename of the list (output from 7zip).
First goal is to read the full (and garbled) filename from the filesystem.
Second goal is to change the "encoding scheme flag" of the
bytestring of the filename *without* changing the bytes themselves.
I cannot identify the part of the code above, which reads the filenames off the filesystem, which definetly is a result of my being a novice and no monk...;)
How can I implement the algorithm described above?
Thank you very mauch in advance!
Best regards,
mcc
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|