in reply to Re^3: Zip file from WWW::Mechanize
in thread Zip file from WWW::Mechanize

jack3:/usr/genoais/aiq/aiq/bin$ od -c auto_20090318_0610.zip | head -n + 4 0000000 P K 003 004 024 \0 \0 \0 \b \0 a I s : S +F 0000020 215 200 C & 005 \0 345 233 * \0 026 \0 \0 \0 a +u 0000040 t o _ 2 0 0 9 0 3 1 8 _ 0 6 1 +0 0000060 . c s v 344 275 [ s 333 310 226 . 370 > 021 36 +3 jack3:/usr/genoais/aiq/aiq/bin$ od -c report.zip | head -n 4 0000000 P K 003 004 024 \0 \0 \0 \b \0 a I s : S +F 0000020 357 277 275 357 277 275 C & 005 \0 357 277 275 357 277 27 +5 0000040 * \0 026 \0 \0 \0 a u t o _ 2 0 0 9 +0 0000060 3 1 8 _ 0 6 1 0 . c s v 357 277 275 35 +7

Replies are listed 'Best First'.
Re^5: Zip file from WWW::Mechanize
by ikegami (Patriarch) on Mar 23, 2009 at 20:47 UTC
    "\215" (and basically every other byte ≥128) are being replaced with "\357\277\275". "\357\277\275" is the UTF-8 encoding of \x{FFFD}, the replacement character used to represent bad data.
    $ perl -MEncode -e'printf "%04X\n", ord decode "UTF-8", "\357\277\275" +' FFFD

    It sounds like something tried to decode the zip file.

    $ perl -MEncode -we'print decode "UTF-8", "\215"' | od -c Wide character in print at -e line 1. 0000000 357 277 275 0000003

    While WWW::Mechanize's content calls decoded_content (defined in HTTP::Message), decoded_content shouldn't attempt to decode a zip file (only files with MIME type text/*).

    Is the web server incorrectly saying the .zip is a UTF-8 text file? Could you provide the output of the following:

    print $mech->response()->headers()->as_string();

    Delete "Set-Cookie:" headers and other authentication data before posting.

      Does this help? This is a Dumper($mech):
      'content-typ +e' => 'text/html; charset=utf-8', 'server' => ' +Microsoft-IIS/6.0', 'content-styl +e-type' => 'text/css', 'x-are' => 'y +ou digging my headers?', 'x-powered-b +y' => [ + 'http://www.bandwidth.com', + 'ASP.NET' + ], 'content-dis +position' => 'attachment;filename=auto_20090318_0610.zip', 'client-resp +onse-num' => 1, 'content-len +gth' => '337991', 'x-aspnet-ve +rsion' => '2.0.50727',
      I notice the content-type and content length.

      Jack

        That confirms that the server is giving you garbage. It's saying that the zip file is really an UTF-8 HTML document.

        'content-type' => 'text/html; charset=utf-8', 'content-disposition' => 'attachment;filename=auto_20090318_0610.zip',

        The solution is to fix the response received from the web server.

        BEGIN { my $old_make_request = WWW::Mechanize->can('_make_request'); no warnings 'redefine'; *WWW::Mechanize::_make_request = sub { my $response = $old_make_request->(@_); my $type = $response->header('Content-Type'); my $dispo = $response->header('Content-Disposition'); $response->header('Content-Type' => 'application/zip') if defined($dispo) && $dispo =~ m{\.zip$} && defined($type) && $type =~ m{^text/}; return $response; }; }

        Untested.