Thanks for the explanation. What if I fork, and have the child process feed data to the parent? That way the parent can read from the pipe when it needs to, and the child can fill it independently of the parent. Here's the modified code:
use FileHandle;
sub pipe_from_fork ($) {
my $parent = shift;
pipe $parent, my $child or die;
my $pid = fork();
die "fork() failed: $!" unless defined $pid;
if ($pid) {
close $child;
}
else {
close $parent;
open(STDOUT, ">&=" . fileno($child)) or die;
}
$pid;
}
sub decompress_filehandle
{
my $fh = shift;
my $dfh = new FileHandle;
# Must use pipe_from_fork because $dfh->open('-|') not yet implement
+ed on
# Windows. See perlfork for details
unless (pipe_from_fork($dfh)) {
# In child
close $dfh;
open(FRONT_OF_PIPE, '| c:\progra~1\cygwin\bin\gzip.exe -cd')
or return (undef,"Can't execute \"$filter_command\" on file hand
+le: $!");
print FRONT_OF_PIPE <$fh>;
$fh->close()
or return (undef,"Can't execute \"$filter_command\" on file hand
+le: $!");
close FRONT_OF_PIPE;
exit;
}
# In parent
close $fh;
return $dfh;
}
my $fh = new FileHandle('mailarc-1.txt.gz');
my $dfh = decompress_filehandle($fh);
print $_ while <$dfh>;
As before, this works on Unix but not Windows. It seems to hang trying to close the pipe to gzip -cd. I would have thought that this would be a common thing to do, even on Windows... | [reply] [d/l] |
Why not simply pass the name of the file to the command and let it read from disk, and the then pipe the input back to you? Why complicate things?
I'm also intrigued by why, given tye's educational explanation above, your method would work on Linux and not on Windows. The only rational explanation I can think of is that Linux uses bigger buffers on it pipes than windows, and the file you are testing with succeeds in fitting entirely within the buffer on the former so never blocks.
Have you tried this on Linux with a bigger input file?
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
| [reply] |
Actually, I think the issue might be that pipes are done differently on windows -- at least, this is true for any command-line usage (whether using the standard MS-DOS "command.com" or something else, like a windows port of bash). Based on my own experience using pipeline commands on windows and *nix, I have drawn the following conclusions (not backed up by any authoritative docs, but the behavior I observed seemed to make a pretty clear case):
Basically, on *nix, when a command line involves two or more processes in a pipeline, the processes are loaded in reverse order: last one on the command line starts first (and waits for input), and the first one is the last to be started. As data flow through the pipe, potentially all processes will be active simultaneously (unless the first one quits before its pipeline buffer fills -- I think a common buffer size is 8K); output from the tail end process will begin to appear as soon as it finishes its first buffers-worth of data.
On windows, the processes are run one at a time in lock-step: the first one runs, and the OS stashes its output in temp storage somewhere (presumably on disk, or using swap, or something). When its done, it exits, the next process starts, and the OS feeds it the stuff that was gathered from the first one. And so on for each successive process in the pipeline.
Would this mean that open2 would never work properly on windows? I don't know -- I never tried it.
| [reply] |
I would just pass the filename, but I can't do that if the data comes from STDIN. I'm guessing that if I follow Abigail's suggestion, I can be agnostic wrt the source of the data (file or stream).
| [reply] |