nlevesque has asked for the wisdom of the Perl Monks concerning the following question:

Using Archive::Zip or Archive::Extract, the files of the zip file come out with changed characters (on Windows, tried with Cygwin and get the same results). For example, if I zip the files "CREATIVE SERVICES .txt" and "LACOURSIčRE.txt" using Winzip, I can unzip them properly using Winzip. But if I run my zip file through this:
use Archive::Extract; my $ae = Archive::Extract->new( archive => 'example.zip' ); my $ok = $ae->extract;
The files come out as "CREATIVE SERVICES˙.txt" and "LACOURSIŠRE.txt". Doing some research, it seems the issue might not be Perl related per se and could have something to do with the character set used by Perl. Not sure if this is the same as the command prompt one. Does anyone know what I should do in order to get the files extracted with the proper name? Thanks!

Replies are listed 'Best First'.
Re: Unzip help needed.
by ikegami (Patriarch) on Jul 24, 2009 at 21:34 UTC

    You're seeing the following swaps:
    "è" is U+00E8 ⇒ "Š" is U+0160
    " " is U+00A0* ⇒ "ÿ" is U+00FF

    I found that:
    decode('cp1252', encode('cp437', chr(0x00E8))) eq chr(0x0160)
    decode('cp1252', encode('cp437', chr(0x00A0))) eq chr(0x00FF)

    This is the only match I found. I looked at UTF-8, UCS-2le, iso-8859-* and (only) a few code pages.

    That explains what is happening. It doesn't determine who is doing what and who is to blame, but it's a start.

    * — Well, it could be something other than U+00A0, but it would be a mighty big coincidence.

      Thanks! Indeed it is a start and it feels like something really simple to change, yet I still haven't figured out how to make the necessary change to avoid the issue.
        Does anyone have any idea? I would like to avoid relying on a third party software such as "Command line Winzip" when it can be done in Perl... Thanks!