in reply to Another way to avoid File::Find

Two problems with this.

First, the HUP signal might not be delivered one-to-one. Since the "notify the process that HUP has been received" is just a one-bit value in the process table, if the process doesn't get woken up quickly enough, two HUPs will be delivered as only one hit.

Second, if you kill the child process before reading the names, you might not actually get to read the names, because that will all depend on buffering and flushing and such.

Thus, I suggest you merely use an ordinary loop, and when you've read the Nth name, just close the handle. On the next write, the child will die anyway. If you really want to optimize that, read in the loop, and then kill the child when you've already read the name.

Or, just write the loop using File::Find (no child process), because I bet that will be within striking distance of using the child anyway, and you can get precisely the semantics you want.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Replies are listed 'Best First'.
Re^2: Another way to avoid File::Find
by graff (Chancellor) on Nov 18, 2006 at 20:01 UTC
    Thus, I suggest you merely use an ordinary loop, and when you've read the Nth name, just close the handle. On the next write, the child will die anyway. If you really want to optimize that, read in the loop, and then kill the child when you've already read the name.

    The reason that wouldn't work is that the output from the child "find" process will be buffered, and it's just as likely perl will have to wait till the child finishes before actually getting anything in the while loop. If there's a way to make the child use autoflush, this would solve it, but I didn't find a way to do that.

    Or, just write the loop using File::Find (no child process), because I bet that will be within striking distance of using the child anyway, and you can get precisely the semantics you want.

    The 6x (or worse) wall-clock slow down that File::Find imposes (not to mention the excess cpu load that goes with it) would defeat the whole purpose of the exercise.

    the HUP signal might not be delivered one-to-one. Since the "notify the process that HUP has been received" is just a one-bit value in the process table, if the process doesn't get woken up quickly enough, two HUPs will be delivered as only one hit.

    Interesting -- it didn't happen when I tested (and the links were pretty close together in the directory tree), but maybe some other IPC method would nail this.

    (<update> Actually, I just tried another test: mkdir test; cd test; ln ../otherfile test1.link; ln ../otherfile test2.link -- that puts two links to one target right next to each other in a single directory. Both All three HUP signals got through as intended, so I don't think this is a problem -- and nothing was missing from the output list, so the next problem you mention seems moot as well.</update>)

    if you kill the child process before reading the names, you might not actually get to read the names, because that will all depend on buffering and flushing and such.

    Again, there was no such problem in an initial test, and I doubt there would ever be such a problem: since find's output is buffered, and the parent only kills it after getting its signal (which, as per the other possible theoretical problem you cited, the parent might not get), the full list of files will be there.

      Race conditions, in some instances, may not be able to be reproduced on demand. However, the race condition is still there. Lack of a positive does not prove a negative.

      In your case, with the current load on your machine, this may not crop up. Down the road, when the machine is more loaded, or the filesystem is being hammered, when you wonder why your process is hanging around without finding the last piece of data, remember Randal's words.

      Spoken from experience,

      --MidLifeXis