in reply to Help!: Open2 Hangs on Windows

This is doomed to fail as most compression programs are likely to start producing output before having read the entire input stream so you'll get a deadlock unless your data is very small.

First, the operating system buffer for the pipe that the compression program is writing to fills up. Then the compression program tries to write more data out and this hangs waiting for the pipe to be drained (at least partially). This stops the compression program from reading further input. So soon the other pipe's buffer fills up. Then your Perl script tries to write to it and hangs waiting for that pipe to be drained (at least partially).

Finally the two processes just sit around waiting for each other to drain their pipes which never happens and eventually you get tired of waiting and kill them.

                - tye
  • Comment on Re: Help!: Open2 Hangs on Windows (doomed)

Replies are listed 'Best First'.
Re: Re: Help!: Open2 Hangs on Windows (doomed)
by coppit (Beadle) on Sep 04, 2003 at 21:11 UTC
    Thanks for the explanation. What if I fork, and have the child process feed data to the parent? That way the parent can read from the pipe when it needs to, and the child can fill it independently of the parent. Here's the modified code:
    use FileHandle; sub pipe_from_fork ($) { my $parent = shift; pipe $parent, my $child or die; my $pid = fork(); die "fork() failed: $!" unless defined $pid; if ($pid) { close $child; } else { close $parent; open(STDOUT, ">&=" . fileno($child)) or die; } $pid; } sub decompress_filehandle { my $fh = shift; my $dfh = new FileHandle; # Must use pipe_from_fork because $dfh->open('-|') not yet implement +ed on # Windows. See perlfork for details unless (pipe_from_fork($dfh)) { # In child close $dfh; open(FRONT_OF_PIPE, '| c:\progra~1\cygwin\bin\gzip.exe -cd') or return (undef,"Can't execute \"$filter_command\" on file hand +le: $!"); print FRONT_OF_PIPE <$fh>; $fh->close() or return (undef,"Can't execute \"$filter_command\" on file hand +le: $!"); close FRONT_OF_PIPE; exit; } # In parent close $fh; return $dfh; } my $fh = new FileHandle('mailarc-1.txt.gz'); my $dfh = decompress_filehandle($fh); print $_ while <$dfh>;
    As before, this works on Unix but not Windows. It seems to hang trying to close the pipe to gzip -cd. I would have thought that this would be a common thing to do, even on Windows...

      Why not simply pass the name of the file to the command and let it read from disk, and the then pipe the input back to you? Why complicate things?

      I'm also intrigued by why, given tye's educational explanation above, your method would work on Linux and not on Windows. The only rational explanation I can think of is that Linux uses bigger buffers on it pipes than windows, and the file you are testing with succeeds in fitting entirely within the buffer on the former so never blocks.

      Have you tried this on Linux with a bigger input file?


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      If I understand your problem, I can solve it! Of course, the same can be said for you.

        Actually, I think the issue might be that pipes are done differently on windows -- at least, this is true for any command-line usage (whether using the standard MS-DOS "command.com" or something else, like a windows port of bash). Based on my own experience using pipeline commands on windows and *nix, I have drawn the following conclusions (not backed up by any authoritative docs, but the behavior I observed seemed to make a pretty clear case):

        Basically, on *nix, when a command line involves two or more processes in a pipeline, the processes are loaded in reverse order: last one on the command line starts first (and waits for input), and the first one is the last to be started. As data flow through the pipe, potentially all processes will be active simultaneously (unless the first one quits before its pipeline buffer fills -- I think a common buffer size is 8K); output from the tail end process will begin to appear as soon as it finishes its first buffers-worth of data.

        On windows, the processes are run one at a time in lock-step: the first one runs, and the OS stashes its output in temp storage somewhere (presumably on disk, or using swap, or something). When its done, it exits, the next process starts, and the OS feeds it the stuff that was gathered from the first one. And so on for each successive process in the pipeline.

        Would this mean that open2 would never work properly on windows? I don't know -- I never tried it.

        I would just pass the filename, but I can't do that if the data comes from STDIN. I'm guessing that if I follow Abigail's suggestion, I can be agnostic wrt the source of the data (file or stream).