thejasviv has asked for the wisdom of the Perl Monks concerning the following question:

Hello, This could be more of a Linux/OS question, but thought I would try it here anyway. I have a script which gets a directory handle through "opendir()", and then enters a loop. In each iteration of the loop, it should print all the files which where created in the directory since the last iteration. Here is the code:

opendir(my $dirh, "/somewhere") or die; while(1) { while(my $file = readdir($dirh)) { print("$file\n"); } sleep(1); }

The first iteration works exactly as expected - it prints all the files currently present in the directory. However, the subsequent iterations are not working quite as I expect them to. I touched some files manually in the directory after the first iteration, and was expecting them to show up in the script output. However, that didn't happen: the script didn't get anything in the call to readdir at all. Technically I can work around this problem by opening and closing the directory in each iteration - and then do my own bookkeeping to keep track of which files are new. However, having to read the complete directory in each iteration, even when I knew a lot of those files were not new since the last iteration, was something I didn't like too well. Any help/suggestion is greatly appreciated. Sincerely, Thejasvi V

  • Comment on The "readdir()" fails to detect files created after the call to "opendir()"
  • Download Code

Replies are listed 'Best First'.
Re: The "readdir()" fails to detect files created after the call to "opendir()"
by Athanasius (Archbishop) on Jan 20, 2015 at 06:20 UTC
Re: The "readdir()" fails to detect files created after the call to "opendir()"
by roboticus (Chancellor) on Jan 20, 2015 at 12:08 UTC

    thejasviv:

    If you're on a Linux box, I'd suggest looking at Linux::Inotify2 to detect changes in a directory.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      I'm not a Linux person, but was just thinking that even Windows has events to notify listeners about file system changes - Glad to see that Linux also does this

        SimonPratt:

        Yeah, I figured people may not be aware that Linux and MacOSX also have filesystem events available. I try to use filesystem events when I can, to avoid repeated directory scanning. Especially when many directories would need scanning.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: The "readdir()" fails to detect files created after the call to "opendir()"
by soonix (Chancellor) on Jan 20, 2015 at 09:48 UTC

    You will have to "work around". Even if readdir would work the way you think, there'd be no guarantee that new entries would be put "at the end".

    And anyway, you haven't specified what you want to do about modified files.

Re: The "readdir()" fails to detect files created after the call to "opendir()" (iterators)
by LanX (Saint) on Jan 20, 2015 at 12:39 UTC
    More kind of a meta mediation:

    Actually this (non-)behavior is consistent with most other iterators on Perl.

    For instance modifying a foreach @array is discouraged and modifying a %hash would reset each . A normal filehandle is controlled by tell and seek (IIRC).

    So the default is to consider the iterated data as immutable.

    There is no generally accepted way to handle other case except leaving it to the programmer to work around it with extra constructs.

    Besides how should "a new file" be defined? Another name? Another inode? Other content? Changed timestamp?

    YMMV!

    Cheers Rolf

    PS: Je suis Charlie!

Re: The "readdir()" fails to detect files created after the call to "opendir()"
by choroba (Cardinal) on Jan 20, 2015 at 12:58 UTC
    You need to reload the directory for each iteration of the outer loop:
    #!/usr/bin/perl use strict; use warnings; while (opendir my $dirh, "/somewhere/else") { while (my $file = readdir $dirh) { print "$file\n"; } sleep 1; }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: The "readdir()" fails to detect files created after the call to "opendir()"
by thejasviv (Initiate) on Jan 21, 2015 at 01:06 UTC

    Thank you all for the responses. Yep, I had guessed something similar about the placement/order of new entries in the directory file, but still wanted a confirmation. Work-around like close-and-reopen, rewinddir(), etc... work well. And for those of you who wanted to know what the "new" files meant, and what I did with those, here it is: I consolidate the files written to a common network location by multiple processes running on various servers. So in each iteration, I check if any new files (files with different names than the ones I have encountered so far) have been created in the directory. If yes, I open read handlers to those files and push them onto a hash. I then iterate through all the file handlers, read any new lines, and write them all to the output file.

Re: The "readdir()" fails to detect files created after the call to "opendir()"
by locked_user sundialsvc4 (Abbot) on Jan 20, 2015 at 17:16 UTC

    “Directories,” and file-systems in general, are unpredictable data structures.   Entries will not appear in any particular order.   Therefore, you must read the entire directory contents list into memory, then sort that list yourself, then work with the sorted list.   If you need to know what has appeared or disappeared, you must likewise do the comparison yourself.

    Also, when you are scanning a directory, you should initiate the scan, run it fully to completion, then close it.   Don’t fall asleep with a scan left-open.   The resource that you are using might be limited, and so your process might be exerting an unforseen negative influence on its neighbors.   (And, in the most benign of circumstances, you can easily fail to see files or see files more than once.)

    Note that some file metadata, such as most-recent-access timestamps, are not always supported.   They require a large number of disk writes to take place, and for a reason that might be judged to be not worth the effort (or, in the case of a truly read-only filesystem, not possible).   So, they might have been turned-off by the system administrators.

    Most operating systems provide a separate API by which you can “watch” a directory to detect what changes have recently been made to it, but these facilities are specific both to the OS-type and to a particular networked file-system.   They might be privileged.   They might be “convenient, but expensive.”