in reply to Re: quick way to read a few directory entries of many
in thread quick way to read a few directory entries of many

I have an 'incoming' directory that may or may not contain files that have to be 'processed'.
There are off an on a ton of files, sometimes indeed in the tens of thousands.

The procedure that each file undergoes may be expensive- thus I have a daemon sort of thing.. that will run x times during the day and maybe a lot during the night.. or maybe if it detects that the cpu has been "idle" for x minutes.

So I take a few files, maybe ten, and do something with them, sleep or check for cpu usage.. then iterate.
My frustration is that sometimes it takes a third of the time per iteration to pick some files.

I am aware that I can cache the directory read data, etc etc .. I am not seeking a way to change what I am doing, I am seeking to .. pick some files out of many- quickly. I figure it's something that would be worth setting precedent to - for the future.

Maybe you are suggesting I could pipe in the file data directly from pointers to the dir struct or something funny like that? (ext3)

  • Comment on Re^2: quick way to read a few directory entries of many

Replies are listed 'Best First'.
Re^3: quick way to read a few directory entries of many (inotify)
by almut (Canon) on Jun 05, 2008 at 18:58 UTC

    Maybe you could somehow take advantage of Linux's inotify mechanism (and Linux::Inotify2) — if you're on Linux, that is, of course... but your mention of ext3 sounds like you might be.  In other words, you could scan the entire directory once, and then update the resulting data structure as you receive events about individual files being added, removed, etc.  Or something like that.  Just an idea...

Re^3: quick way to read a few directory entries of many
by chrism01 (Friar) on Jun 06, 2008 at 05:36 UTC
    Depends on your definition of 'processed'. I've had similar jobs, but the files only needed to be processed once. More accurately, each instance of a file only needed to be processed once, so I always mv the 'done' files to an arc dir and gzip them as I process them.