in reply to Re: Re: LWP not returning leading spaces in web page (ver 2)
in thread LWP not returning leading spaces in web page (ver 2)

Hmmm... on further investigation it looks like yahoo is returning different HTML to Moz when it requests gzipped content.

Running

my $ua = new LWP::UserAgent; my $request = HTTP::Request->new('GET', 'http://groups.yahoo.com/'); $request->headers->header( accept_encoding => 'x-gzip, gzip, identity', user_agent => 'Mozilla/5.0 (compatible; Konqueror/3; Linux)', ); my $r = $ua->request($request); print $r->content;

will print the gzipped content that you're seeing in Moz. So it's not that LWP is dropping anything, but that Moz is being given different content :-)

Replies are listed 'Best First'.
Re:^4 LWP not returning leading spaces in web page (ver 2)
by aspen (Sexton) on Feb 02, 2003 at 22:51 UTC

    Yes, exactly.

    So, how does one unzip the resulting content. I've tried using Compress::Zlib but without success. Looking around there is some information about apache adding a 10-byte header. I have tried stripping off the first 10 bytes, but in every case Zlib's inflate gives me an error code of -3, "unknown compression method".

    Do you know how to go about inflating an apache-compressed page??

    Andy

    @_="the journeyman larry disciple keeps learning\n"=~/(.)/gs, print(map$_[$_-77],unpack(q=c*=,q@QSdM[]uRMNV^[ni_\[N]eki^y@))

      You want the gzip related methods of Compress::Zlib rather than inflate. The most direct method would be:

      use LWP; use Compress::Zlib; my $ua = new LWP::UserAgent; my $request = HTTP::Request->new('GET', 'http://groups.yahoo.com/'); $request->headers->header( accept_encoding => 'x-gzip, gzip, identity', user_agent => 'Mozilla/5.0 (compatible; Konqueror/3; Linux)', ); my $r = $ua->request($request); my $gzipped_content = $r->content; print Compress::Zlib::memGunzip($gzipped_content);
        adrianh, I really owe you a thank you!

        The final solution was a combination of requesting Groups.yahoo.com to return the page as gzip compressed, followed by your reference to the correct zlib call.

        Thanks again to all. I AM now retrieving the originally desired pages, containing SMTP headers, with the desired leading spaces.

        Apache was stripping out the leading spaces when retrieving pages in uncompressed format. Reading some of the apache documentation, it appears this is something called "light compression". I did not find a header string that would turn this off (except through requesting gzip encoding).

        Andy

        @_="the journeyman larry disciple keeps learning\n"=~/(.)/gs, print(map$_[$_-77],unpack(q=c*=,q@QSdM[]uRMNV^[ni_\[N]eki^y@))