The reason why you should open, lock, read, process, write, then close is that it is the only safe approach. If you do anything else, then there is simply no way to know when you go to write whether the data you read is still valid.
Now further comments.
If you have performance problems, I would start to look for bottlenecks. Here are some places to look.
- Can you speed up what you are doing with the data from the files? For instance if you are loading a lot of data with require/do, then you may find using Storable to be much better.
- Is there redundant extra work you can find ways to avoid? For instance if you want to do a minor edit, you need to rewrite the whole file. With DB_File you can use the rather efficient Berkeley DB database which uses on-disk data structures that allow edits to only rewrite a small part of the file. (A tip. Look up BTREE in the documentation. For semi-random access of large data sets, a BTREE is significantly faster than hashing because it caches better.)
- Are there any major points of contention? For instance lots of processes may need to touch the same index file. But if you can get away with using the newer interface to Berkeley DB, BerkeleyDB, then you may be able to have them lock just the section they need, so that multiple processes can manipulate the file at once. Alternately you might split the index file out into multiple editable sections, and have a process produce the old index file through a routine merge.
- What does your directory structure look like? When people use flatfiles it is very easy to wind up with directories of thousands of files. However most filesystems have array-based implementations, so that results in a lot of repeated scanning of inodes to access files. This can kill performance. With access functions for your files you can turn large flat directories into nested trees which can be accessed much more efficiently.
- If you can put an abstraction API in front of the disk access, then you can move to a real database. This may give you huge performance benefits. (Not to mention internal sanity improvements.)
OK, that should be enough ideas to keep you busy for the next 6 months... :-)
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.