Highly efficient variable-sharing among processes

cnd has asked for the wisdom of the Perl Monks concerning the following question:

My unsolved quest for efficient variable(memory) sharing on linux continues.

I have a monster (lots of gigs, upto 200gb in future) in-memory hash and a nice perl sub which does lookups and returns results for me.

I want *other* processes to be able to do lookups as well, obviously without every process loading it's own duplicate copy of the hash.

The rate of lookups is extreme, *many* millions per minute at least.

Can anyone think of an efficient way to code this? The "lookup" child processes will be mod_perl (web cgi); I'm hoping the giant-hash process can simply be a local daemon.

I'm wanting to have zero unnecessary overhead: so: no interprocess communication if possible (no sockets, no messages), no memory-shuffling (I don't want to move gigs of hash out of a shared memory pool into the web server pool prior to starting the lookup), no serializing...

Thoughts?

Has anyone ever built a modern perl from source? What about the idea of modding the source to introduce some kind of shared region that way? e.g. every variable with the name "shared_*" uses a different subset of RAM or something like that?

Comment on Highly efficient variable-sharing among processes

Replies are listed 'Best First'.
Re: Highly efficient variable-sharing among processes by Marshall (Canon) on Aug 28, 2016 at 00:53 UTC
I looked at your node history and saw this post Efficient shared memory - possible? how?? from 2012 and others. Looks like you've been working on this project for many years and whatever you've been doing has been working, but now you are reaching some kind of performance limit? Is that right? I suppose that it is possible that in order for this application to scale to really big levels, a fundamentally different approach may be needed rather than just making what you have now faster? I don't know. A brief description of what you have working now, the benchmarking that you've done and the problems that are happening now as this application scales could be useful in "thinking outside the box". You say that this huge hash lookup feeds some web application. Normally a web app doesn't require nanosecond response times. A vast number of transactions per minute with an "acceptable" response time for each request to the user is what usually matters. Some tens of milliseconds typically won't matter at all. A human eye blink takes 300 ms or so, our hearing can detect say 50 milliseconds difference between 2 different voice prompts. This sort of time frame allows some ms devoted to IPC to go on. I am wondering if some distributed DB that uses a pile of smaller machines in a distributed network rather than this single 200GB super monster, could be considered? That could provide further scalability and redundancy? Maybe the DB lookup needs to be more complex than just one hash key at a time? (get what you need for the page all in one transaction). Sorry that I don't have a simple answer that says "x". I am curious since your questions over the years seem to have a common theme that is hard to solve easily.	[reply]
Re: Highly efficient variable-sharing among processes by BrowserUk (Patriarch) on Aug 27, 2016 at 22:54 UTC
Can anyone think of an efficient way to code this? Yes. Under Windows, this is quite trivial. The mechanism is called Aynchronous remote procedure call. When the 'remote' process is in the same address space, this is a highly efficient mechanism. I suspect that if you look outside of POSIX and OSF-DCE, then you will find a mechanism and library that works with the raw Linux kernel to provide an equivalent mechanism. I have no idea what it is called; but a quick google turns up: This And RPC::Simple Beyond that, I have no relevant knowledge for *nix; so please take this in the spirit it is meant: that of a lead for you to follow on the understanding that I don't know if it will lead to anything useful to you or not. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re: Highly efficient variable-sharing among processes by hippo (Archbishop) on Aug 28, 2016 at 12:35 UTC
Your requirements mostly sound a lot like memcached to me. Have you considered/benchmarked it? In what way does it fail to meet your criteria? There are loads of modules available to interact with it - perhaps something like Cache::Memcached::Fast would suit your needs?	[reply]
Re: Highly efficient variable-sharing among processes by ikegami (Patriarch) on Aug 29, 2016 at 15:12 UTC
no interprocess communication if possible That only leaves having a copy of the hash. A copy made by `fork` won't actually take up any additional memory (until a change is made to the memory page).	[reply] [d/l]
Re^2: Highly efficient variable-sharing among processes by BrowserUk (Patriarch) on Aug 29, 2016 at 15:46 UTC
That only leaves having a copy of the hash. Of course it doesn't. You've heard of shared memory? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Highly efficient variable-sharing among processes by zwon (Abbot) on Aug 29, 2016 at 20:14 UTC
Shared memory is a way of interprocess communication	[reply]
Re^4: Highly efficient variable-sharing among processes by BrowserUk (Patriarch) on Aug 29, 2016 at 20:51 UTC
Re^5: Highly efficient variable-sharing among processes by Marshall (Canon) on Aug 29, 2016 at 22:31 UTC
Some notes below your chosen depth have not been shown here
Re^5: Highly efficient variable-sharing among processes by zwon (Abbot) on Aug 30, 2016 at 21:14 UTC
Re: Highly efficient variable-sharing among processes by RonW (Parson) on Aug 29, 2016 at 23:05 UTC
I don't know enough about mod_perl to know whether this applies, but.... An initial process that first loads the hash could then fork off "copies" of itself. As long as the accesses to the hash (after it is loaded) are read only, then you will have a single copy of the hash shared by several processes. If mod_perl will let you do that, great. If not, your Perl program could be a very simple server that does nothing but accept lookup requests and respond with the results. Been a long time since I've done this, but there are socket options that allow you to create a listening socket, then fork and have all the child processes continue to listen on that socket. You use the accept function to accept an incoming request. Accept will (when successful) return a new socket. You read the lookup request from that socket, then write the response to it, finally closing it. The listening socket will still be there for you to call accept to get the next request. The child processes will receive the requests on a first-come-first-served basis.	[reply]