Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

let us say that I am trying to write a program, in perl (obviously) that will receive requests from a TCP socket of some sort, and search through N number of tied hashes (which are tied to DBM's.) And let's say that this is running on a big beefy 2 proc alpha box, so it is in my best interest to run this as a 2 process application, so that I can fully utilize both processes and the oodles of spindles that I have in this machine. Now, here is the problem, comparitively, tying and untying hashes is an expensive proposition, and unfortunately I have MANY more hashes than I do available filehandles (and the situation is going to get worse.) What I would like to implement is some type of shared memory LRU cache that alowed both processes to share a list of commonly opened filehandles (tied hashes) so that we wouldn't have unnceccessary duplication of workload.

What is the big problem? Well, ALL of the shared memory modules available for perl (at CPAN) and all of the cache modules available for perl (also at CPAN) have too many limitations, the most common one being that the only thing that can be shared is information which can be serialized, which pretty much leaves out filehandles right off the bat. so, does anyone have any ideas how I would share an LRU cache of filehandles among multiple processes?

  • Comment on shared memory LRU cache which containes filehandles

Replies are listed 'Best First'.
Re: shared memory LRU cache which containes filehandles
by lhoward (Vicar) on Jun 01, 2000 at 18:22 UTC
    Under UNIX I don't believe it is possible to share filehandles among processes since the kernel's filehandle table is linked to the processes PID. A few other options come to mind:
    • Have N query-handling processes with each handling M of the DBM files (where M < max filehandles per process). When you want to know if X is in any of the hashes use IPC to message each of your DBM query handling processes to see if they have it in any of their files.
    • If you are daring try perl's multithreading architecture with a similar approach to above. Not sure if multithreads will get around the per-process filehandle limit though.
    • Load the DBM files into an SQL DB and work off of it. May not be a viable option if the DBM files change frequently (if they do you're already in deep trouble and all the spiffy hardware in the world won't make this fast).
    Personally I would try and find a way to load these files into a SQL DB or any solution you come up with is likely not to scale very well.
Shared memory is too OS specific
by Corion (Patriarch) on Jun 01, 2000 at 18:29 UTC

    I guess the problem with those modules is, that they are all too general/generic and shared memory is a fairly OS specific thing.

    One "solution" could be to fork() your second server after you have opened all necessary files in your first server, but this won't be a LRU cache then but simply shared file descriptors and won't help with your problem. Maybe you could rethink your problem and rearrange the hashes and files into one big (SQL) database to offload that burden to the database, but the "free" databases aren't really better than tied hashes I guess.

    One thing you could maybe do is, instead of splitting up your application into two identical processes, split up your application into two different parts, a server part that acts (more or less) like a database server and one (or more, depending on the number of CPUs you get) client that submits the requests to the server. Depending on the data size, the data could be moved via shared memory - but the bottleneck would still be with that one database server process ...

    Update:A few people have said that (at least in their opinion) a free database should be at least on par if not better than a tied hash. This can be (IMO) only true if the whole database structure is also changed and the access is moved completely to SQL (that is, JOINs are used and other stuff that allows the database to optimize the query).

      PostgreSQL is considerably better than a tied hash, and it is not just nominally free, it is genuine Free Software.
RE: shared memory LRU cache which containes filehandles
by Jonathan (Curate) on Jun 01, 2000 at 18:58 UTC
    I agree with lhoward's answer. Surely its time to think of SQL and a relational database (MySQL etc). I also disagree that the free databases are little better than tied hashes. MySQL has a very good reputation. In fact IMHO, if your database has under 1000 tables and key table contain less than a million rows then no commercial product is worth buying. (specially Oracle :-)