http://qs1969.pair.com?node_id=1187111

nysus recently asked whether there was a way to get at all the currently open filehandles. Together with Discipulus I concocted a module which does that. It records open, close along with their respective time, and drops filehandles from the track record as soon as they get undefined or go out of scope.

package FileHandle::Track; use Time::HiRes qw(gettimeofday); use Hash::Util::FieldHash qw(id_2obj); my %fd; BEGIN{ Hash::Util::FieldHash::fieldhash %fd; my $open = sub { @_ > 2 ? open $_[0],$_[1],$_[2] : open $_[0], $_[1]; }; my $close = sub { close $_[0] }; *CORE::GLOBAL::open = sub { my $result = $open->(@_); if ($result) { $fd{$_[0]}->{open} = join " ",@_[1,2],caller; $fd{$_[0]}->{opentime} = join ".", gettimeofday; } $result; }; *CORE::GLOBAL::close = sub { my $result = $close->(@_); $fd{$_[0]}->{close} = join " ", caller; if ($result) { $fd{$_[0]}->{close} .= " (closed)"; } else { $fd{$_[0]}->{close} .= " (close failed)"; } $fd{$_[0]}->{closetime} = join ".", gettimeofday; $result; }; } sub get_fds { return { map { id_2obj($_), $fd{$_} } keys %fd }; }

After making that into a module proper (tests, documentation with due credits) I'll upload that to cpan.

Any suggestions, critics, enhancements?

perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

Replies are listed 'Best First'.
Re: Track open file handles
by Eily (Monsignor) on Apr 05, 2017 at 15:40 UTC

    id_2obj returns undef when I try open with a glob or a bareword rather than a scalar, id_obj($_) // $_ fixed that.

    I don't get the point of using Hash::Util::FieldHash and id_2obj though, since creating the anonymous returned by get_fds will only keep the stringified version of the object. Not having a Hash::Util::FieldHash hash, and not calling id_2obj gives the same result.

    I thought it would be a good idea to do all the work (fill the hash) in $open and $close, and then goto those functions rather than call them, to remove the extra stack entry. This would have made your module transparent when using autodie and a verbose Carp (autodie croaks). But your $open and $close subs are bypassed anywhere autodie is on. It does work again when lexically disabling autodie with no autodie;

      I don't get the point of using Hash::Util::FieldHash and id_2obj though, since creating the anonymous returned by get_fds will only keep the stringified version of the object.

      The point of using Hash::Util::FieldHash is that the key/value pair in a fieldhash gets automatically deallocated when its underlying object gets out of scope or is destroyed. If I would use a normal hash, and would be allocating at open, deallocating at close, I would not have the filehandles which have been successfully opened and closed and gone out of scope but not been destroyed, because there is a dangling reference to them.

      The stringified filehandle object... well, that is due to indecision. If the anonymous hash returned by get_fds would sport the filehandle objects themselves, their reference count would be increased. If that returned result is not deallocated, further leakage would ensue, which would counter the whole purpose of that module. OTOH, getting a list of open filehandles for subsequent close would be a nice thing, too.

      Fix for the bareword issue (also for open *STDERR etc):

      *CORE::GLOBAL::open = sub { my $result = $open->(@_); my $ref = ref $_[0] ? $_[0] : *{$_[0]}{GLOB}; if ($result) { $fd{$ref}->{open} = join " ",@_,caller; $fd{$ref}->{opentime} = join ".", gettimeofday; } $result; }; *CORE::GLOBAL::close = sub { my $result = $close->(@_); my $ref = ref $_[0] ? $_[0] : *{$_[0]}{GLOB}; $fd{$ref}->{close} = join " ", caller; if ($result) { $fd{$ref}->{close} .= " (closed)"; } else { $fd{$ref}->{close} .= " (close failed)"; } $fd{$ref}->{closetime} = join ".", gettimeofday; $result; };

      update: alternative class method/function get_fds:

      sub get_fds { return { map { my $fd = id_2obj $_; $fd => { fd => $fd, %{$fd{$_}} } } keys %fd }; }
      perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

        Without Hash::Util::FieldHash, no reference to the filehandle is kept anywhere (since the hash would only keep a stringified version), so there's nothing preventing it from being destroyed. But it does make sense to stop tracking a filehandle if it can't be accessed anywhere because the variables holding are all out of scope.

        I think returning the filehandle is a good idea, although it can be used for cleanup, it should only be a temporary solution until the real issue with the code is fixed. And if someone trying to fix their code just keeps collecting references to their handles without releasing them, the problem probably is not with your module in the first place. Maybe you can keep a weakened reference to the handle though, that way unless the user keeps a copy of that reference, you have access to the handle, without preventing its deletion.

Re: Track open file handles
by mr_mischief (Monsignor) on Apr 05, 2017 at 17:45 UTC

    I would see great value in the solution if I saw value in creating the problem it solves. This seems useful perhaps as an aid in the toolkit for refactoring or debugging existing bad code. When writing new code I think it'd be easier, more productive, and less invasive to just keep track of the resources a program needs to use within the program in the first place.

      Of course yes! Normally there's no need to keep track on file handles, if you use them properly. This module is meant as a tool to track down bugs, and to find out the bugs aren't where you think they are -because bugs are always elsewhere. So maybe it should live in the Devel namespace, e.g. Devel::FileHandle::Track or so.

      ...if I saw value in creating the problem it solves.

      The only value in creating problems is to learn how not to create them in the first place.

      perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

        It could theoretically come in handy as part of a test suite for a distribution that must track handles (because it maintains several at any given time), to ensure none accidentally leak during a run.

        My File::Edit::Portable came to mind immediately. Although there's thorough coverage already, it's one place something like this would be quite useful.

Re: Track open file handles
by salva (Canon) on Apr 06, 2017 at 07:57 UTC
    There is a bug where it says...
    @_ > 2 ? open $_[0],$_[1],$_[2] : open $_[0], $_[1];

    open can accept more than two arguments (i.e. when piping to/from an external program), so, you should use instead...

    @_ > 2 ? open $_[0],$_[1],@_[2..$#_] : open $_[0], $_[1];

    Also, I am not sure that would work for the cases where the second argument is a string with a file handler embedded, as for instance open($fh, ">&STDOUT") or its three-args form open($fh, ">&", "STDOUT"). You will probably need to prefix those names with the calling package name.

      Thank you for spotting the bug.

      Also, I am not sure that would work for the cases where the second argument is a string with a file handler embedded, as for instance open($fh, ">&STDOUT") or its three-args form open($fh, ">&", "STDOUT"). You will probably need to prefix those names with the calling package name.

      Yes, true. It's actually more complicated than that. A lexical filehandle is made into a GLOB reference by open. But since that open() happens in FileHandle::Track, the associated symbol is generated using that package and the lexical filehandle variable:

      package blorf; open my $h,">","blorfldyick"; print $h,$/; print *{$h},$/ __END__ GLOB(0xbf4e78) *blorf::$h

      I can see no way to circumvent that. Not that it matters much, because it doesn't matter to the lexical filehandle which holds the GLOB reference. It's just that e.g. Data::Dumper shows all filehandles as belonging to the FileHandle::Track package when dumping the hashref returned from get_fds... meh :-(

      $VAR1 = { 'GLOB(0x151ce78)' => { 'fd' => \*{'FileHandle::Track::$_[...]'}, 'open' => 'GLOB(0x151ce78) > blorfldyick open main -e 1', 'opentime' => '1491480967.655713' } };
      perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
        Yes, true. It's actually more complicated than that. A lexical filehandle is made into a GLOB reference by open. But since that open() happens in FileHandle::Track, the associated symbol is generated using that package and the lexical filehandle variable

        Ah, but that is yet another different problem affecting only the name of the glob, which is something quite unimportant.

        I was talking about file handles passed as arguments by name. open looks for then into the caller package, so you would probably have to add specific code to handle those cases and qualify the file handle names yourself, or use some other trick, like creating the wrapper in the DB namespace, or using XS, etc.

        update:

        I can see no way to circumvent that

        Probably some module exists on CPAN allowing you to change the glob name.

        How about $_[0] = eval "*".caller."::G".(0+\$_[0]) unless ref $_[0] eq "GLOB" or ref \$_[0]  eq "GLOB"; before the call to $open? This prevents the call to the standard open to autovivify a GLOB when none is available.

Re: Track open file handles
by Discipulus (Canon) on Apr 06, 2017 at 07:43 UTC
    It is cool and it well answer to nysus's original question.

    thanks to have mentioned me even if i've done nearly nothing; I was playng with the, never seen before, ${^LAST_FH} variable. with no success.

    The resulting module can be helpful in some situation: developping a many files to read application or injecting it into existing codebase to spot fh leaks.

    To be complete it can be worth to add some optional overrides: sysopen , opendir closedir for example.

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.