What have you tried thus far? The kind of task you are thinking of -keeping track of files already processed, along with times- tends to make me think of a hash. Since this info must be made persistent, just try one of the various serialization modules available from CPAN, which may range e.g. from Storable to YAML::Syck, depending on your actual needs.
| [reply] |
Interesting, I'll have to look at Storable for some of my own stuff.
| [reply] |
It looks to me like you may not really need to keep track of what you've already done at all. Simply keeping track of when you last ran and then examining all files which have been modified since then should accomplish the same thing while also being substantially simpler.
Caveats:
1) This assumes that the time spent in each processing run is minimal. If it takes 5 minutes, then, yeah, it won't work because the logs are probably going to change during that time.
2) If your situation is such that you need to provide 'proof' that certain files have been processed at certain times then, of course, you'll need the records to satisfy those requirements. | [reply] |
I've done something similar. What I do may not be the most ideal, but it works. I have a job that runs once a day to move logs from Directory A on a specific server and moves them to a central log repository on another server (this runs on several servers). When the job starts, it reads in a file that contains a list of the logs that have already been moved. Then it gets a list of files in the directory and moves anything that is not on the list. Not perfect since it relies on an external file, but it works pretty nicely. What you might want would be a list of files that have already been processed stored in some sort of Perl data structure. | [reply] |