coppit has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks! I'm trying to write a routine which takes a filehandle of compressed data and returns a filehandle to the uncompressed data. I don't want to use Compress::Zlib because I want to support tzip, bzip2, etc. Here's what I've implemented using open2:
use IPC::Open2; use FileHandle; sub decompress_filehandle { my $fh = shift; $WRITEHANDLE = new FileHandle(); $READHANDLE = new FileHandle(); $procid = open2($READHANDLE,$WRITEHANDLE,'gzip -cd'); #$procid = open2($READHANDLE,$WRITEHANDLE,'c:\progra~1\cygwin\bin\gz +ip.exe -cd'); die "Couldn't open2: $!\n" unless defined $procid; while (<$fh>) { print $WRITEHANDLE $_; } close ($WRITEHANDLE) or die "Error in child: $!\n"; return $READHANDLE; # put in END{}? # waitpid $pid,0; } my $fh = new FileHandle('mailarc-1.txt.gz'); my $dfh = decompress_filehandle($fh); while(<$dfh>) { print $_; } close ($dfh) or die "Error in child: $!\n";
This works fine on Linux, but hangs on the open2 line with Windows XP, ActiveState perl 5.8.0. I can see gzip is running in task manager. I suspect it's some sort of buffering problem. Can anyone help? As a second attempt, I tried using IPC::Run:
use IPC::Run qw( run ); use FileHandle; sub decompress_filehandle { my $fh = shift; my $out = new FileHandle; #run ['gzip','-cd'], $fh, '>pipe', $out; run ['c:\progra~1\cygwin\bin\gzip.exe','-cd'], $fh, '>pipe', $out; return $out; } my $fh = new FileHandle('mailarc-1.txt.gz'); my $dfh = decompress_filehandle($fh); print $_ while <$dfh>;
Unfortunately this hangs on both Linux and WinXP. :( Any suggestions would be appreciated.

Replies are listed 'Best First'.
Re: Help!: Open2 Hangs on Windows (doomed)
by tye (Sage) on Sep 04, 2003 at 19:47 UTC

    This is doomed to fail as most compression programs are likely to start producing output before having read the entire input stream so you'll get a deadlock unless your data is very small.

    First, the operating system buffer for the pipe that the compression program is writing to fills up. Then the compression program tries to write more data out and this hangs waiting for the pipe to be drained (at least partially). This stops the compression program from reading further input. So soon the other pipe's buffer fills up. Then your Perl script tries to write to it and hangs waiting for that pipe to be drained (at least partially).

    Finally the two processes just sit around waiting for each other to drain their pipes which never happens and eventually you get tired of waiting and kill them.

                    - tye
      Thanks for the explanation. What if I fork, and have the child process feed data to the parent? That way the parent can read from the pipe when it needs to, and the child can fill it independently of the parent. Here's the modified code:
      use FileHandle; sub pipe_from_fork ($) { my $parent = shift; pipe $parent, my $child or die; my $pid = fork(); die "fork() failed: $!" unless defined $pid; if ($pid) { close $child; } else { close $parent; open(STDOUT, ">&=" . fileno($child)) or die; } $pid; } sub decompress_filehandle { my $fh = shift; my $dfh = new FileHandle; # Must use pipe_from_fork because $dfh->open('-|') not yet implement +ed on # Windows. See perlfork for details unless (pipe_from_fork($dfh)) { # In child close $dfh; open(FRONT_OF_PIPE, '| c:\progra~1\cygwin\bin\gzip.exe -cd') or return (undef,"Can't execute \"$filter_command\" on file hand +le: $!"); print FRONT_OF_PIPE <$fh>; $fh->close() or return (undef,"Can't execute \"$filter_command\" on file hand +le: $!"); close FRONT_OF_PIPE; exit; } # In parent close $fh; return $dfh; } my $fh = new FileHandle('mailarc-1.txt.gz'); my $dfh = decompress_filehandle($fh); print $_ while <$dfh>;
      As before, this works on Unix but not Windows. It seems to hang trying to close the pipe to gzip -cd. I would have thought that this would be a common thing to do, even on Windows...

        Why not simply pass the name of the file to the command and let it read from disk, and the then pipe the input back to you? Why complicate things?

        I'm also intrigued by why, given tye's educational explanation above, your method would work on Linux and not on Windows. The only rational explanation I can think of is that Linux uses bigger buffers on it pipes than windows, and the file you are testing with succeeds in fitting entirely within the buffer on the former so never blocks.

        Have you tried this on Linux with a bigger input file?


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
        If I understand your problem, I can solve it! Of course, the same can be said for you.

Re: Help!: Open2 Hangs on Windows
by runrig (Abbot) on Sep 04, 2003 at 21:59 UTC
    One idea: use File::Temp to get a temporary filename, and write your gzip output to that. Then read from the temp file. (you probably wanted to avoid a temp file, but at least this module can clean it up for you).
Re: Help!: Open2 Hangs on Windows
by graff (Chancellor) on Sep 05, 2003 at 04:09 UTC
    The different behaviors of windows vs. linux might involve a number of issues. First off, you're opening a compressed file for input (via the $fh file handle), but then you're treating is as if it were plain text -- you don't seem to be doing binmode on it (which would be essential for it to work properly on windows, IIRC), and you use  while (<$fh>) to read it, as if you could reasonably expect it to be a series of "lines" terminated with the standard value of $/ ($INPUT_RECORD_SEPARATOR, which of course is a bit different on windows than it is on linux). I think tye's remarks about the buffering grid lock are on the mark in your case, but I also think that the way you're handling the input has something to do with it as well.

    What you might want, rather than open2 or uncompressing to a temp file, is simply something like this:

    open( IN, "gzip -cd $input_file_name |" ); while (<IN>) { ... # process lines of uncompressed text }
    I gather you want flexibility with different compression methods, and you would like to modularize it (give a file name to a sub, have it return a file handle for reading the uncompressed data). Both of those goals can be achieved with this sort of single-handle procedure (but I'll leave it to you to figure out how).