isync has asked for the wisdom of the Perl Monks concerning the following question:

Hi!

I am having a lot of trouble reconstructing a full http response from a file.
I (simplified) used:
my $response = $useragent->get($url); my $stored_as_file = $response->as_string;

to store the full http response to a file. Then, I'd like to recreate this response object by doing:
my $response = HTTP::Response->parse( $stored_as_file );

And it does not work! It seems as if the problem arises only on gzipped content. I used the "accept: gzip" header in the original GET and everything arrives ok and is stored ok to disk. But when I use the parse method, a first part of the still gzipped content ends up as header field, thus breaking the rest of the gzipped binary data in the content part of the http response. It seems as if the parse method does not properly split the header and the content at the "\n\n", but a bit later down the page, thus interpreting parts of the gzipped data as a header field...

Am I using the "parse" method not right? Is it not meant to parse a whole response?

Or is there another solution, maybe tricking LWP::UserAgent into reading from disk and not from the web? Like reversing $ua->get( $url, :content_file )? (This is exactly what I'd like to do!)

Replies are listed 'Best First'.
Re: How to reconstruct HTTP::Response from file properly
by Corion (Patriarch) on May 02, 2007 at 06:41 UTC

    You don't show us how you save and restore the data to and from disk. Does the following code work for you?

    use strict; use Test::More tests => 1; # setup, to be provided by you my $response = $useragent->get($url); my $stored_as_file = $response->as_string; my $restored_response = HTTP::Response->parse( $stored_as_file ); is $stored_as_file, $restored_response->as_string, "->as_string() is i +dempotent";

    If that code works for you and prints "OK", then the problem is with your saving and restoring, most likely because you're running on a system with a Unicode locale, or Win32. Most likely, you're missing a

    binmode $fh;
    after your
    open my $fh, ">", $temp_response_name or die "Couldn't create '$temp_r +esponse_name': $!";
    .

      A unicode problem is actually my thought as well... But I think I have everything setup correctly:
      $stored_as_file = Compress::Zlib::memGzip($stored_as_file); open(FILE ,">:utf8", "$file") or die "err: $!"; binmode FILE; print FILE $stored_as_file; close(FILE);
      is what I use to write the data compressed to disk. And this
      local( $/, *FILE ); open(FILE, "<:utf8", "$file") or die "err: $!"; binmode FILE; my $stored_as_file = <FILE>; close(FILE); $stored_as_file = Compress::Zlib::memGunzip($stored_as_file);
      is used to read it back. BTW: I am running Linux (no binmode there, I thought...).

        Your compressed data is not unicode anymore - it's compressed octets. So I'd write it out and read it back in as plain as possible without the :utf8. The "No binmode for unixish OSes" mantra is a cargo cult meme going back to the days where there were no IO layers. You should always use binmode on your binary files.

Re: How to reconstruct HTTP::Response from file properly
by isync (Hermit) on May 02, 2007 at 13:06 UTC
    After 12+ hours on this bug hunt, I decided to declare the HTTP::Response->parse() method as buggy parsing gzipped content and switched over to use the HTTP::Parser module:
    use HTTP::Parser; my $parser = HTTP::Parser->new(response => 1); my $status = $parser->add( $stored_as_file ); my $restored_response = $parser->object();
    The fault does not seem to be on my part, because this module gets the split between header-data and (possibly gzipped)-content-data right every time! If I were on cpan, I would file a bug report for the otherwise rock-solid libwww-perl modules.

    A few things to note:
    - HTML::Parser does a slight modification reconstructing the original response, adding a X-HTTP-Version header and removing HTTP-Version from the GET line.
    - I still don't know if it will munge HTTP/0.9 and HTTP/1.0 right..

    update:
    And then I found out HTML::Parser breaks on some pages, failing to recognize which part of the message is the content - leading to an empty $content string...

    update2:
    I decided that I need finer control over the process and now I am using HTTP::MessageParser to re-construct the original response object from file:
    my ( $HTTP_Version, $Status_Code, $Reason_Phrase ) = HTTP::MessagePars +er->parse_response_line( $stored_as_file ); my ( $Method, $Request_URI, $HTTP_Version, $Headers, $Body ) = HTTP::M +essageParser->parse_response( $stored_as_file ); my $restored_response= HTTP::Response->new( $Status_Code, $Reason_Phra +se, $Headers, ${$Body} );
    and so far it works... (any comments?)