Re (tilly) 3: File locking, lock files, and how it all sucks

The reason why you should open, lock, read, process, write, then close is that it is the only safe approach. If you do anything else, then there is simply no way to know when you go to write whether the data you read is still valid.

Now further comments.

If you have performance problems, I would start to look for bottlenecks. Here are some places to look.

Can you speed up what you are doing with the data from the files? For instance if you are loading a lot of data with require/do, then you may find using Storable to be much better.
Is there redundant extra work you can find ways to avoid? For instance if you want to do a minor edit, you need to rewrite the whole file. With DB_File you can use the rather efficient Berkeley DB database which uses on-disk data structures that allow edits to only rewrite a small part of the file. (A tip. Look up BTREE in the documentation. For semi-random access of large data sets, a BTREE is significantly faster than hashing because it caches better.)
Are there any major points of contention? For instance lots of processes may need to touch the same index file. But if you can get away with using the newer interface to Berkeley DB, BerkeleyDB, then you may be able to have them lock just the section they need, so that multiple processes can manipulate the file at once. Alternately you might split the index file out into multiple editable sections, and have a process produce the old index file through a routine merge.
What does your directory structure look like? When people use flatfiles it is very easy to wind up with directories of thousands of files. However most filesystems have array-based implementations, so that results in a lot of repeated scanning of inodes to access files. This can kill performance. With access functions for your files you can turn large flat directories into nested trees which can be accessed much more efficiently.
If you can put an abstraction API in front of the disk access, then you can move to a real database. This may give you huge performance benefits. (Not to mention internal sanity improvements.)

OK, that should be enough ideas to keep you busy for the next 6 months... :-)

Comment on Re (tilly) 3: File locking, lock files, and how it all sucks

Replies are listed 'Best First'.
Re: Re (tilly) 3: File locking, lock files, and how it all sucks by tocie (Novice) on Aug 22, 2001 at 01:52 UTC
Unfortunately, as I noted above, we have to be able to run "everywhere". This excludes using Storable (Gods, I'd KILL to use Storable, it's made my life MUCH easier in other projects) and any sort of DBM file, as we can not rely on things that must be compiled on the system, may not be versions we expect, or may otherwise be out of date. (Meaning that we've ended up rolling our own mutations of common modules such as MIME::Lite and CGI just to avoid having to rely on preinstalled copies). That throws 1, 2, and 3 right out the window. :( :( :( As for 4... The previous permutation of this script ocassionally had to read a couple hundred (or thousand) files at once... I've fixed that now. Unfortunately, eliminating that problem only made the other problems that had been lurking in the background come out. (You surely know the story - you have three bugs, so fixing one makes the other two show up even more... :) ) 5 is already in progress... not that it would ever be an official release. Management is sending mixed signals. (Thankfully, the product has a healthy community of code hackers who are continuously adding on and altering things... they'll figure out what I did sooner or later! :) ) Thank you for the input.. this is VERY valuable stuff!	[reply]

Replies are listed 'Best First'.

Re: Re (tilly) 3: File locking, lock files, and how it all sucks
by tocie (Novice) on Aug 22, 2001 at 01:52 UTC

Unfortunately, as I noted above, we have to be able to run "everywhere". This excludes using Storable (Gods, I'd *KILL* to use Storable, it's made my life MUCH easier in other projects) and any sort of DBM file, as we can not rely on things that must be compiled on the system, may not be versions we expect, or may otherwise be out of date. (Meaning that we've ended up rolling our own mutations of common modules such as MIME::Lite and CGI just to avoid having to rely on preinstalled copies).

That throws 1, 2, and 3 right out the window. :( :( :(

As for 4... The previous permutation of this script ocassionally had to read a couple hundred (or thousand) files at once... I've fixed that now. Unfortunately, eliminating that problem only made the other problems that had been lurking in the background come out. (You surely know the story - you have three bugs, so fixing one makes the other two show up even more... :) )

5 is already in progress... not that it would ever be an official release. Management is sending mixed signals. (Thankfully, the product has a healthy community of code hackers who are continuously adding on and altering things... they'll figure out what I did sooner or later! :) )

Thank you for the input.. this is VERY valuable stuff!

[reply]