Monk::Thomas has asked for the wisdom of the Perl Monks concerning the following question:

The following code fragment is part of a nagios check that is running on about 70 hosts. There's one host which fails the close() and I don't know why. Maybe some exotic edge case?
use strict; use warnings; require 5.008_008; use Carp; use English qw( -no_match_vars ); [...] open my $fh, '-|', "$command" or carp "Error while executing command:$ERRNO\n"; my @command_output = <$fh>; close $fh or carp "ERROR: Couldn't close filehandle: $ERRNO\n";
(I'm wondering why there's carp instead of croak, but that's a different issue.)

The open/close is part of a subroutine, $command is provided as a parameter. In this case it's "LC_ALL=C sudo mpt-status --newstyle --probe_id" and the command output is:

Checking for SCSI ID:0 ioc:0 vol_id:0 type:IM raidlevel:RAID-1 num_disks:2 size(GB):148 state +: OPTIMAL flags: ENABLED ioc:0 phys_id:1 scsi_id:8 vendor:ATA product_id:SAMSUNG HE160HJ +revision:0-24 size(GB):149 state: ONLINE flags: NONE sync_state: 100 +ASC/ASCQ:0xff/0xff SMART ASC/ASCQ:0xff/0xff ioc:0 phys_id:0 scsi_id:1 vendor:ATA product_id:ST3160812AS +revision:J size(GB):149 state: ONLINE flags: NONE sync_state: 100 +ASC/ASCQ:0xff/0xff SMART ASC/ASCQ:0xff/0xff ioc:0 spare_id:2 scsi_id:255 vendor: product_id: revision: size(GB):14 +9 state: MISSING flags: OUT_OF_SYNC sync_state: n/a ASC/ASCQ:0x00/0x0 +0 SMART ASC/ASCQ:0x00/0x00 scsi_id:1 100% scsi_id:0 100%
The server is running Ubuntu 12.04 LTS, like at least 40 other ones. (The other ones don't exhibit this behaviour.) The installed perl version is 5.14.2-6ubuntu2.4.

ERRATA: I provided the wrong command, it wasn't 'LC_ALL=C sudo mpt-status --controller $id' but 'LC_ALL=C sudo mpt-status --newstyle --probe_id'. The output is/was correct.

Replies are listed 'Best First'.
Re: close $fh fails on a single host - looking for explanation
by marto (Cardinal) on Feb 13, 2014 at 10:55 UTC

    Looks to me like a RAID issue. Has the server in question had disk/controller problems? I don't think this a perl issue.

    Update: for context Reconstruction:

    "In short: quite often you get a temporary failure of several disks at once; afterwards the RAID superblocks are out of sync and you can no longer init your RAID array."

      The server had a disk problem in the past. Hmm. The command's exit code is 16. Could this provoke perl into closing the filehandle automagically?

      I wasn't expecting perl to close a pipe-filehandle on a non-zero exitcode1, but that would explain what is going on: Can't close a non-existing filehandle.

      1 My assumption: Either always close a pipe after exhausting the input or always leave it open.

      P.S.: The raid has been rebuild successfully. The controller is just complaining that the failed disk has not been properly removed from it's configuration yet.

        CONFIRMED. The non-zero exit code is the culprit.

        implemented solution:
        open my $fh, '-|', "$command || true"

        alternative solution: do not close the filehandle

Re: close $fh fails on a single host - looking for explanation
by kcott (Archbishop) on Feb 14, 2014 at 04:19 UTC

    G'day Monk::Thomas,

    The documentation for the close function has information on filehandles associated with pipes. It includes this example code which (with appropriate modification) may be a better choice for your script:

    ... close OUTPUT or warn $! ? "Error closing sort pipe: $!" : "Exit status $? from sort";
    "(I'm wondering why there's carp instead of croak, but that's a different issue.)"

    carp will output a message and keep going while croak will output a message and terminate the script (see Carp for details).

    You say the code you posted "is part of a subroutine". There could be any number of reasons for not terminating the script at this point. Without knowing what else the script is doing, it's impossible to say (including whether croak would be more appropriate).

    -- Ken

Re: close $fh fails on a single host - looking for explanation
by choroba (Cardinal) on Feb 13, 2014 at 10:54 UTC
    Where is the "ERROR: Couldn't close filehandle" line?
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      $ ./check_raid ERROR: Couldn't close filehandle: at ./check_raid line 123
      (line 123 is 'close $fh')
        Maybe its a problem with English, what is the value of $! and %!? Try sub Fudge