After 12+ hours on this bug hunt, I decided to declare the HTTP::Response->parse() method as buggy parsing gzipped content and switched over to use the HTTP::Parser module:
use HTTP::Parser;
my $parser = HTTP::Parser->new(response => 1);
my $status = $parser->add( $stored_as_file );
my $restored_response = $parser->object();
The fault does not seem to be on my part, because this module gets the split between header-data and (possibly gzipped)-content-data right every time! If I were on cpan, I would file a bug report for the otherwise rock-solid libwww-perl modules.
A few things to note:
- HTML::Parser does a slight modification reconstructing the original response, adding a X-HTTP-Version header and removing HTTP-Version from the GET line.
- I still don't know if it will munge HTTP/0.9 and HTTP/1.0 right..
update:
And then I found out HTML::Parser breaks on some pages, failing to recognize which part of the message is the content - leading to an empty $content string...
update2:
I decided that I need finer control over the process and now I am using HTTP::MessageParser to re-construct the original response object from file:
my ( $HTTP_Version, $Status_Code, $Reason_Phrase ) = HTTP::MessagePars
+er->parse_response_line( $stored_as_file );
my ( $Method, $Request_URI, $HTTP_Version, $Headers, $Body ) = HTTP::M
+essageParser->parse_response( $stored_as_file );
my $restored_response= HTTP::Response->new( $Status_Code, $Reason_Phra
+se, $Headers, ${$Body} );
and so far it works... (any comments?)
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.