Another way to avoid File::Find

(... for unix/linux folks only) The problem posed by this SoPW node implies a need to quickly perform a scan for hard links: given a path to search in, and a target file whose inode number may be tied to some number of hard links under that path, find those links (i.e. all references to that inode number), and ditch out as soon as you've found them.

Since timing is an issue, the general slowness of File::Find (relative to the compiled "find" command) could pose a real problem -- especially on very large directory trees -- so the challenge is: how to get the find command to do the right thing, and quit when there are no more links to be found?

The cool thing here is how the IPC is done: child sends HUP signals to the parent each time a link is found (via the "-exec" option on unix find), parent closes everything up as soon as the expected link count is reached.

#!/usr/bin/perl

use strict;
use warnings;

my ( $path, $file ) = @ARGV;
die "Usage: $0  search/path  data.file\n"
    unless ( -d $path and -f $file );

my ( $inode, $nlinks ) = ( stat _ )[1,3];
die "$file has no hard links\n" if $nlinks == 1;

my ( $chld, $nfound, @found );
$SIG{HUP} = sub { $nfound++; `kill $chld` if $nfound == $nlinks };

$chld = open( FIND, "-|", "find $path -inum $inode -print0 -exec kill 
+-HUP $$ \\;" )
    or die "find: $!\n";

$/ = chr(0);
while ( <FIND> ) {
    chomp;
    push @found, $_;
}

printf( "found %d of %d links for %s in %s:\n",
    scalar @found, $nlinks, $inode, $path );
print join( "\n", @found ), "\n";
[download]

Comment on Another way to avoid File::Find Download Code

Replies are listed 'Best First'.
Re: Another way to avoid File::Find by merlyn (Sage) on Nov 18, 2006 at 10:59 UTC
Two problems with this. First, the HUP signal might not be delivered one-to-one. Since the "notify the process that HUP has been received" is just a one-bit value in the process table, if the process doesn't get woken up quickly enough, two HUPs will be delivered as only one hit. Second, if you kill the child process before reading the names, you might not actually get to read the names, because that will all depend on buffering and flushing and such. Thus, I suggest you merely use an ordinary loop, and when you've read the Nth name, just close the handle. On the next write, the child will die anyway. If you really want to optimize that, read in the loop, and then kill the child when you've already read the name. Or, just write the loop using File::Find (no child process), because I bet that will be within striking distance of using the child anyway, and you can get precisely the semantics you want. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply]
Re^2: Another way to avoid File::Find by graff (Chancellor) on Nov 18, 2006 at 20:01 UTC
Thus, I suggest you merely use an ordinary loop, and when you've read the Nth name, just close the handle. On the next write, the child will die anyway. If you really want to optimize that, read in the loop, and then kill the child when you've already read the name. The reason that wouldn't work is that the output from the child "find" process will be buffered, and it's just as likely perl will have to wait till the child finishes before actually getting anything in the while loop. If there's a way to make the child use autoflush, this would solve it, but I didn't find a way to do that. Or, just write the loop using File::Find (no child process), because I bet that will be within striking distance of using the child anyway, and you can get precisely the semantics you want. The 6x (or worse) wall-clock slow down that File::Find imposes (not to mention the excess cpu load that goes with it) would defeat the whole purpose of the exercise. the HUP signal might not be delivered one-to-one. Since the "notify the process that HUP has been received" is just a one-bit value in the process table, if the process doesn't get woken up quickly enough, two HUPs will be delivered as only one hit. Interesting -- it didn't happen when I tested (and the links were pretty close together in the directory tree), but maybe some other IPC method would nail this. (<update> Actually, I just tried another test: `mkdir test; cd test; ln ../otherfile test1.link; ln ../otherfile test2.link` -- that puts two links to one target right next to each other in a single directory. ~~Both~~ All three HUP signals got through as intended, so I don't think this is a problem -- and nothing was missing from the output list, so the next problem you mention seems moot as well.</update>) if you kill the child process before reading the names, you might not actually get to read the names, because that will all depend on buffering and flushing and such. Again, there was no such problem in an initial test, and I doubt there would ever be such a problem: since find's output is buffered, and the parent only kills it after getting its signal (which, as per the other ~~possible~~ theoretical problem you cited, the parent might not get), the full list of files will be there.	[reply] [d/l]
Re^3: Another way to avoid File::Find by MidLifeXis (Monsignor) on Nov 20, 2006 at 18:41 UTC
Race conditions, in some instances, may not be able to be reproduced on demand. However, the race condition is still there. Lack of a positive does not prove a negative. In your case, with the current load on your machine, this may not crop up. Down the road, when the machine is more loaded, or the filesystem is being hammered, when you wonder why your process is hanging around without finding the last piece of data, remember Randal's words. Spoken from experience, --MidLifeXis	[reply]