in reply to Re^6: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
in thread Solved: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize

I see, thanks. Can you explain, when it's possible that the output of "gunzip" is valid (but partial, truncated) uncompressed data plus obviously binary, still compressed "tail", as in Re^4: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize. I couldn't get such result regardless of "Transparent" and all other parameters -- always uncompressed partial data only, instead.

  • Comment on Re^7: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize

Replies are listed 'Best First'.
Re^8: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
by pmqs (Friar) on Dec 20, 2018 at 13:52 UTC

    IO::Uncompress::Gunzip expects to be given valid and complete gzip data stream. If that doesn't happen, it fails.

    I'll use the compressed data in $gzipped below to illustrate.

    use IO::Compress::Gzip qw(gzip); use IO::Uncompress::Gunzip qw(gunzip $GunzipError); my $data = 'I include only the bare bones because I tried somethin +g'; # Create some compressed data my $gzipped ; gzip \$data => \$gzipped ;

    Lets start with the valid part. If I create data corruption in the compressed data stream, bad things happen

    my $corrupt = $gzipped; # Overwrite part of the compressed data with junk substr($corrupt, 10, 3, "BAD") ; gunzip \$corrupt => \$uncompressed or print "Cannot gunzip: $GunzipError\n";

    That will output

    Cannot gunzip: Inflation Error: data error

    If you get that, there is no point in continuing.

    Next is a truncated data stream (which is what this ticket is all about).

    # truncate the compressed data my $truncated = substr($gzipped, 0, 10); gunzip \$truncated => \$uncompressed or print "Cannot gunzip: $GunzipError\n";

    that will output this

    Cannot gunzip: unexpected end of file

    In this instance, you can try to get more data, append to the input buffer ($truncated in this case) and uncompress the whole thing again. The only semi-valid use for this technique is when you are certain that you will eventually get a complete gzip data stream. That does not seem to be the case in this instance.