VZmaster has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

i'm trying to download html pages, everything is working fine except for pages where single html line contains length longer than 64k bytes. Downloading interrupts, line cutted at 64k, rest of page skipped. Return code - 200 ok.

Interestingly wget does the same, truncated html page at line length 64k. And no errors.

But any browser on windows machine opens that pages correctly.

Setting
use LWP::Protocol::http; push(@LWP::Protocol::http::EXTRA_SOCK_OPTS, MaxLineLength => 0);
not helping at all. Besides i think this limit is for headers.

RHEL5 Server x64, perl 5.8.8

Replies are listed 'Best First'.
Re: LWP hidden 64k content line length limit
by Corion (Patriarch) on Jan 08, 2016 at 09:05 UTC

    Are you really certain that the problem is with wget and LWP, and not maybe with the web server sending different things according to (for example) the User-Agent header?

    As HTTP also can transfer binary data without "lines" at all, I highly doubt that there is a "64k line length limit" in either wget or LWP.

    I would investigate the problem by making certain that wget, LWP and your Windows browsers send the identical headers to the remote end and inspecting closely what gets sent back.

      I think you are right about web server relativity, when i created a copy of the page on my web server wget downloaded it fine.

      So it's seems wget/LWP can't handle something related to line length from original server

      Page is php generated, so headers doesnt include page length information, so this is why there is no error when page interrups.

        So it's seems wget/LWP can't handle something related to line length from original server

        Which version of LWP?

      Investigation headers pointed at gzip compression, all browsers getting content gzipped while wget and lwp gettin it text/plain.

      I disabled gzip in browsers and whola truncated pages. So problem origins lies on the server side.

      But adding header Accept-Encoding:gzip, deflate, sdch not helping in wget, still getting text/plain. I've copied exact header from browser, but anyway it getting text/plain response.

      How to get gzipped response in lwp?

        send all the same headers