Re: Different compression behaviours with Compress::Zlib

Try using the inflate method as taken from the perldoc for Compress::Zlib.
There is a difference in how the headers are written which explains why the the second method is incompatible with gzip.

Note that for zip file manipulation, the docs suggest using Archive::Zip which I would guess is a reason for the default behaviour. I'm still looking into this, but maybe this will help?

#!/usr/bin/perl

use strict;
use warnings;

use Compress::Zlib;

  my $x = inflateInit()
    or die "Cannot create inflation stream";

  my $input = "";
  binmode STDIN;
  binmode STDOUT;

  my ($output, $status);
  while (read(STDIN, $input, 4096)) {
    ($output, $status) = $x->inflate(\$input);
    print $output if $status == Z_OK or $status == Z_STREAM_END;
    last if $status != Z_OK;
  }
[download]

Comment on Re: Different compression behaviours with Compress::Zlib Download Code

Replies are listed 'Best First'.
Re:x2 Different compression behaviours with Compress::Zlib by grinder (Bishop) on Sep 11, 2001 at 00:51 UTC
A little more background. I want to rewrite a backup script that is currently performed in shell. The first step is to write all the files to be backed up to a catalog file, and then write that file to tape. That step alone takes two to three hours to perform. The fact that several processes are spawned per file probably explains the cause and effect. So I want to use File::Find to get the file names, and then write them, line by line, to a gzipped file directly on tape. When it comes to restoring, I'd like to be able to read the catalog back off tape, line by line, in order to be able to apply a regex, to see whether the file in question is to be restored. Using `gzopen` will do the trick, but I'd like to benchmark whether spending more time to compress the file harder to write less bytes onto a glacially slow medium is faster than using less time to compress the file lightly, and spend more time writing to the device. Given that OS buffering may make the issue moot. But I'd like some hard numbers. Currently I have a perl script that takes 17 minutes instead of 150-180 minutes, to generate a 2Mb catalog, instead of 40-45Mb uncompressed catalog. I could read the file back in block by block, but to simplify the alogorithm (to avoid having to worry about a record that is split across two chunks) I'd have to write the whole thing back onto disk first, and then reopen, or seek to zero, and then loop through applying the regex. I was hoping that I could read the compressed catalog off the tape, inflate it record by record, apply the regex and push any hits to a `@todo` list in one pass. -- `g r i n d e r`	[reply]

Replies are listed 'Best First'.

Re:x2 Different compression behaviours with Compress::Zlib
by grinder (Bishop) on Sep 11, 2001 at 00:51 UTC

A little more background. I want to rewrite a backup script that is currently performed in shell. The first step is to write all the files to be backed up to a catalog file, and then write that file to tape. That step alone takes two to three hours to perform. The fact that several processes are spawned per file probably explains the cause and effect.

So I want to use File::Find to get the file names, and then write them, line by line, to a gzipped file directly on tape.

When it comes to restoring, I'd like to be able to read the catalog back off tape, line by line, in order to be able to apply a regex, to see whether the file in question is to be restored.

Using gzopen will do the trick, but I'd like to benchmark whether spending more time to compress the file harder to write less bytes onto a glacially slow medium is faster than using less time to compress the file lightly, and spend more time writing to the device. Given that OS buffering may make the issue moot. But I'd like some hard numbers.

Currently I have a perl script that takes 17 minutes instead of 150-180 minutes, to generate a 2Mb catalog, instead of 40-45Mb uncompressed catalog.

I could read the file back in block by block, but to simplify the alogorithm (to avoid having to worry about a record that is split across two chunks) I'd have to write the whole thing back onto disk first, and then reopen, or seek to zero, and then loop through applying the regex. I was hoping that I could read the compressed catalog off the tape, inflate it record by record, apply the regex and push any hits to a @todo list in one pass.

`g r i n d e r`

[reply]