in reply to Re^4: Database in a folder?
in thread Database in a folder?
Dealing with the last two first. Those two requirements mean that:
I hope it will be clear that whilst your envisaged API would be relatively trivial to implement, even in a thread-safe manner, it would be very slow. Even with the OS filesystem cache working for you--assuming your files are and will remain really quite small--, almost every access requires reading, parsing, and re-writing the entire file each time.
And whilst filesystem locking is reliable, it imposes considerable wait-states upon the application. Get two or 3 threads competing for access to (even different keys within) the same file and it could take whole seconds to read/write a single value. Ie. hundreds of thousand of times slower than accessing a variable in memory.
The "obvious" thing to do then, is cache the files in memory. To maintain coherence across threads, this would need to be shared memory, which whilst considerably slower than non-shared, is far faster than (even cached) disk. The problems with this are:
Changes will not be reflected on disk until the process "flushes its cache to disk". And that slows everything down to disk speed again.
Too many and/or too long and you risk consuming large amounts of memory. Possibly running out.
Too few or too frequent and you're back to the problems of uncached, multiple disk accesses (lock;read;write;unlock) per subkey access or change.
Not to mention hardware failures, backups, et. al!
So, uncached, it is a relatively trivial thing to implement, but will be very, very slow.
In the face of concurrency-regardless of whether its processes or threads--life get very complicated, very quickly. Especially if performance is any kind of factor at all. And if you need to cater for both process and thread, concurrency and coherence, it gets very, very complicated--and slow.
The archetypical solution to these problems is a serialised, client server architecture--eg. your typical RDBMS--but they are only truly effective if you perform queries and updates on-mass. As soon as you start accessing and updating individual key/value pairs one at a time, you have to factor in the communications, request serialisation, transaction and logging overheads, in addition to the fact that the disk may need to be read (and sometimes written). And of course, along the way you 've lost your primary goal of human editable persistant storage.
The simplest mechanism--if you can guarentee only one, multi-threaded process at a time will ever need to be running--would be to load the files at startup into a shared hash of hashes and only write it back to disk when the program shuts down.
A slightly more sophisticated model--under the same assumptions as above--would be to wrap over the existing threads::shared tie-like interface, in a second level of tie that demand loaded individual files the first time they are accessed.
The problem is that shared hashes are already pretty slow because of their combination of tying and locking. And tie itself isn't quick. Combine the two and you're back to a fairly heavy performance penalty. Though still far, far less than locking, reading and writing entire files for every key/value change.
Not what you'll want to hear, but maybe it'll help you reach a decision as to which way to go.
A general description of your envisaged application--cgi or command line or gui; short or long running; volumes of data; volumes of keys/subkeys involved--might engender better targetted responses or alternatives for your problem.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Database in a folder?
by AriSoft (Sexton) on Feb 18, 2010 at 20:20 UTC | |
by BrowserUk (Patriarch) on Feb 18, 2010 at 23:23 UTC |