Special_K has asked for the wisdom of the Perl Monks concerning the following question:

I have the following code segment in a much larger script:


if ($myfile =~ /\.gz$/) { open(MYFILE, "/bin/zcat $myfile |") || die("ERROR: Cannot open $my +file for read: $!\n"); } else { open(MYFILE, "$myfile") || die("ERROR: Cannot open $myfile for rea +d: $!\n"); } my $myfile_timestamp = (stat(MYFILE))[9]; close(MYFILE);

When I run the script that contains the above, I receive the following error at the "close(MYFILE)" line:


gzip: stdout: Broken pipe

But the script runs to completion without any other apparent issues. If I just copy/paste the above code into a standalone script and hardcode the path to $myfile, the error doesn't occur.

If I add "use autodie;" to the top of the script, the script exits at the "close(MYFILE)" line with the following messages:


gzip: stdout: Broken pipe Can't close filehandle 'MYFILE': '' at myscript.pl line 1014

If I add the following code either before or after the timestamp assignment, the error doesn't occur:


while (<MYFILE>) { chomp($_); }

If I don't close the filehandle, the error doesn't occur and it doesn't seem to cause any other problems in the script.

If I change the assignment to $myfile_timestamp to a constant as opposed to reading an attribute from the filehandle, the error is the same.

Does anyone know what could be causing the broken pipe error? Is there any way I can debug it further? Reading from the pipe seems to provide a workaround but I'm not sure why it's necessary. This same code structure (doing either a standard open/read for a non-gzipped file or a zcat pipe open for a gzipped file) is used elsewhere in the script and doesn't ever give a "broken pipe" error.

Replies are listed 'Best First'.
Re: zcat pipe gives "gzip: stdout: Broken pipe" error
by Fletch (Bishop) on Mar 25, 2025 at 16:18 UTC

    In your first code you're not reading the output from zcat so it's griping because you're closing it's output handle "prematurely". If you actually read the output as when you add your while you're hitting EOF and it's happy.

    That being said it really doesn't make sense to stat the filehandle from the pipe read because that's not telling you anything about the underlying file; to get metadata about the path $myfile points to you'd just want to call stat with that path.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      > In your first code you're not reading the output from zcat so it's griping because you're closing it's output handle "prematurely". If you actually read the output as when you add your while you're hitting EOF and it's happy.

      True, but why does the same error not occur when I use that code in a standalone script?

      > That being said it really doesn't make sense to stat the filehandle from the pipe read because that's not telling you anything about the underlying file; to get metadata about the path $myfile points to you'd just want to call stat with that path.

      According to the official documentation (https://perldoc.perl.org/functions/stat), stat is called using a FILEHANDLE or DIRHANDLE, yet the examples on that page all use stat($filename). I just tested it and calling stat() on a filename returns the exact same value as calling it on an open handle to that filename. The documentation seems to have conflicting information.
        > stat is called using a FILEHANDLE or DIRHANDLE, yet the examples on that page all use stat($filename)

        If the $filehandle is associated with a real file, calling stat on it works the same as using the filename directly. But calling stat on a pipe makes no sense, as Perl has no idea what file(s) the command opens and processes.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: zcat pipe gives "gzip: stdout: Broken pipe" error
by ikegami (Patriarch) on Mar 26, 2025 at 14:17 UTC

    Difference: Your large program uses $SIG{ PIPE } = "IGNORE";.

    Solution: Add local $SIG{ PIPE } = "DEFAULT"; in scope of your open. Alternatively, you could redirect zcat's STDERR to /dev/null.


    Take a look at this:

    seq 1000000000 | gzip | zcat | head

    It should take a long time to generate and unzip that stream, but it doesn't.

    $ time ( seq 1000000000 | gzip | zcat | head ) 1 2 3 4 5 6 7 8 9 10 real 0m0.051s user 0m0.056s sys 0m0.005s

    This is what happens:

    1. Once head has printed its ten lines, it exits.
    2. This breaks the pipe from zcat to head.
    3. The next time zcat attempts to write to the pipe, SIGPIPE is sent to it.
    4. Upon receiving SIGPIPE, zcat is killed.
    5. gzip is similarly killed by SIGPIPE.
    6. seq is similarly killed by SIGPIPE.

    The above is what happens in your standalone script.

    In your larger program, you appear to have used $SIG{ PIPE } = "IGNORE";. It's still just as quick, but the error message now appears.

    $ seq 1000000000 | gzip | perl -e'$SIG{ PIPE } = "IGNORE"; exec("zcat" +)' | head 1 2 3 4 5 6 7 8 9 10 gzip: stdout: Broken pipe

    This is what happens now:

    1. zcat inherits its parent's disposition of ignoring SIGPIPE.
    2. Once head has printed its ten lines, it exits.
    3. This breaks the pipe from zcat to head.
    4. The next time zcat attempts to write to the pipe, SIGPIPE is sent to it.
    5. zcat ignores the SIGPIPE.
    6. The write call returns error EPIPE.
    7. zcat exits with a message as a result of the error. (zcat is an alias for gzip, which is why the message says gzip.)
    8. gzip is killed by SIGPIPE.
    9. seq is killed by SIGPIPE.

    Add local $SIG{ PIPE } = "DEFAULT"; in scope of your open.

      Wow, I did not know signal disposition could persist across an exec(). TIL...

        Linux's https://man7.org/linux/man-pages/man2/execve.2.html says:

        All process attributes are preserved during an execve(), except the following:

        • The dispositions of any signals that are being caught are reset to the default (signal(7)).

        • [...]

        So a caught signal such as $SIG{ PIPE } = \&handler; gets reset to $SIG{ PIPE } = "DEFAULT";, but not $SIG{ PIPE } = "IGNORE";.

        Most if not all reset things are out of necessity. $SIG{ PIPE } = \&handler can't be kept since &handler will stop existing.

Re: zcat pipe gives "gzip: stdout: Broken pipe" error
by duelafn (Parson) on Mar 26, 2025 at 13:50 UTC

    Unrelated to your problem, but for reliability and security, I would recommend the 3+ argument form of open,

    open(MYFILE, "-|", "/bin/zcat", $myfile) || die("ERROR: Cannot open $m +yfile for read: $!\n");

    Or better use a lexical scalar,

    my $fh; if ($myfile =~ /\.gz$/) { open($fh, "-|", "/bin/zcat", $myfile) || die("ERROR: Cannot open $ +myfile for read: $!\n"); } ...

    Good Day,
        Dean

Re: zcat pipe gives "gzip: stdout: Broken pipe" error
by cavac (Prior) on Apr 01, 2025 at 13:06 UTC

    I haven't tested it, but there is PerlIO::gzip, which could allow you to do something like this:

    use PerlIO::gzip; use Carp; ... my $iolayer = ''; if($filename =~ /\.gz$/i) { $iolayer = ':gzip'; } open($ifh, '<' . $iolayer, $filename) or croak("$!"); ... # Process data while((my $line = <$ifh>)) { chomp $line; if($line =~ /bla/) { doBla($line); } else { doBlub($line); } } ... close $ifh;

    No pipes, no external program to worry about, just basic IO stuff.

    PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
    Also check out my sisters artwork and my weekly webcomics