Re: Saving and Loading of Variables

It might work to make your hashes into dbm files (though for the large one, you might encounter intermittent slow-downs, if the particular flavor of dbm file you use has to rewrite its index table as the hash grows). It appears that DB_File (the Berkeley DB) supports not only hash structures but also a storage method that would work well for an array ("$DB_RECNO"), but you'd probably have to "serialize" each sub-array into some sort of single scalar value in order to store it into the DB file.

That will keep all your derived structural data on disk as the process runs and grows. Then all you need is a check-pointing strategy that will store the current byte offset into the input log file at regular intervals. On restarting after a shutdown, you should be able to open your DB files, seek to last known offset in the log file, read and process, and check for matching values in the DB files; skip log records until you find novel data. (Or something to that effect.)

(update: Oh yeah, and you should actually consider using a real database to keep track of this derived structural stuff -- it'll be much safer, more stable and accountable, easier and quicker to search and fetch back old information, and so on. With the right table schema, there will be a lot less coding to do, and the code you do write will be a lot more powerful.)

Comment on Re: Saving and Loading of Variables