jck000 has asked for the wisdom of the Perl Monks concerning the following question:

I'm scraping a site with WWW:Mechanize and it works properly. However, one of the links I get produces a zip file. I save with content. When I later try to unzip the file, unzip complains that the file is corrupt.

I tried $mech->save_content(), print to file with binmode and various encoding options on print. Setting encoding to utf8 eliminates a "Wide character" error. But, problem still remains.

I'm using WWWW::Mechanize because it's a secure site with login. It sets a bunch of cookies ann session ids, so using or switching to a different utility for this transaction is not possible.

Any guidance or direction would be appreciated.

Thanks in advance
Jack

UPDATE
It works properly from a browser.

Here are file listings of the same file: -rw-r--r-- 1 jack3 jack3 337991 2009-03-19 23:48 auto_20090318_0610.zi +p -rw-r--r-- 1 jack3 jack3 631747 2009-03-20 00:03 report.zip


Jack

Replies are listed 'Best First'.
Re: Zip file from WWW::Mechanize
by ikegami (Patriarch) on Mar 23, 2009 at 16:40 UTC

    Setting encoding to utf8 eliminates a "Wide character" error.

    It makes no sense to get a wide character error when printing out a zip file. Zip file are made of bytes, yet you get that error when printing a character higher than 255. Encoding the wide characters silences the warning, but doesn't fix the fact that what you are trying to save isn't a zip file.

Re: Zip file from WWW::Mechanize
by MidLifeXis (Monsignor) on Mar 23, 2009 at 16:47 UTC

    What happens when you try to pull the zip file from a browser? Is it possible that the zip file is corrupted as it is being sent from the web server itself?

    --MidLifeXis

    The tomes, scrolls etc are dusty because they reside in a dusty old house, not because they're unused. --hangon in this post

      It works properly from a browser. Here are file listings of the same file: -rw-r--r-- 1 jack3 jack3 337991 2009-03-19 23:48 auto_20090318_0610.zip -rw-r--r-- 1 jack3 jack3 631747 2009-03-20 00:03 report.zip Jack
        The newer file is twice the size. What's the output of
        od -c auto_20090318_0610.zip | head -n 4
        and
        od -c report.zip | head -n 4

        Please include some code; preferably the smallest amount possible that still reproduces your problem. Perhaps there is something inherently wrong with the code that extra eyes may see.

        --MidLifeXis

        The tomes, scrolls etc are dusty because they reside in a dusty old house, not because they're unused. --hangon in this post