Re: Share hash across processes
by jettero (Monsignor) on Jun 08, 2007 at 13:37 UTC
|
That kind of sharing probably requires copying to a certain extent. You might end up using Storable to push it into a shared scalar, or Cache::Memcached to do some of the magic for you, but I don't think there's a magic way to share a nested structure without ... nesting on either end of the share. I could be mistaken, but I don't think you can.
If you choose something like DB_File or DBM::Deep, you'll probably find the use of the local filesystem (if available) to be pretty darned efficient. IIRC, DB_File even handles buffering for you (the berkely db magic).
It is possible to write your own hash class with tie. You could make each key a shared scalar (using whatever method you like). Perhaps that's the best way if you have something really specific in mind, because then you can explicitly state what you want and perl will do it. :)
My thinking is that you might be accustomed to sharing huge segments of memory in C/C++. I'm not sure there's anything directly comparable in perl — though I wouldn't be surprised if there were. As a last resort, you could write something in XS or Inline::C. That sounds pretty interesting to me.
| [reply] |
Re: Share hash across processes (threads?)
by BrowserUk (Patriarch) on Jun 08, 2007 at 14:43 UTC
|
What is the best method to handle access to variables in shared memory directly?
Best? An open question, but certainly easy. 25 x 2MB strings shared between 10 +1 threads will run you ~80MB:
#! perl -slw
use strict;
use threads;
use threads::shared;
our $N ||= 100;
our $M ||= 2e6;
my %hash : shared;
sub process {
my $tid = threads->self->tid;
for( 1 .. $N ) {
for( keys %hash ) {
lock( %hash );
printf "$tid: (%d) $_ => %s\n",
length $hash{ $_ }, substr $hash{ $_ }, 0, 50;
}
}
}
$hash{ $_ } = chr( 64 + $_ ) x $M for 1 .. 25;
my @threads = map{ threads->create( \&process ) } 1 .. 10;
$_->join for @threads;
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: Share hash across processes
by perrin (Chancellor) on Jun 08, 2007 at 15:36 UTC
|
You really can't access Perl variables in shared memory directly. Systems like IPC::Shareable and memcached will always require you to send the data through Storable and create a separate copy of any data you work with in your process. There is no way around this -- at some point you have to turn the stream of bytes coming from the storage you use into a perl variable, which means creating a new one in your local process. There are faster storage mechanisms than those, like BerkeleyDB, but they all have to create a local variable when you want to do something with the data.
It is possible to load the data up and then fork the processes. It will be shared and there will be only one copy due to copy-on-write. It's read-only though.
It sounds like less than 500MB of data even if you load it separately in 10 programs. That doesn't cost much these days. I'd think twice before spending more time on this.
| [reply] |
|
|
Your statement "You really can't access Perl variables in shared memory directly" is what I had resolved myself after researching this and talking to a number of people over the past week. My intution told me otherwise but I thought I would throw it out this group.
My larger issue is that I may have tens of these "groups of ten" jobs running at once so the 500MB quickly becomes 5GB - plus I pay a performance penalty.
| [reply] |
Re: Share hash across processes
by TilRMan (Friar) on Jun 09, 2007 at 04:49 UTC
|
Have a look at mmap(3). Caveat: I've never used any of the Perl interfaces to mmap. | [reply] |
|
|
Yet there are quite a few of them on CPAN.
I recall somebody even made a mmap module that works on Win32; the author was here and on the chatterbox while he was working on it, but I don't remember what user that was.
Maybe it was IPC::Mmap? Ah, no, it must have been Win32::MMF, the name rings a bell. Well, both look like reasonable approaches to me, and both have gotten recent updates, which is a promise it'll most likely still work well. But don't discard the other candidates just yet.
From the top of the thread:
I have a hash (with about 20-25 keys) with very long bit strings as values (1-2MB) that I would like to share directly (readonly) in memory.
That looks like an acceptably small number of keys, so the most sane approach to me seem to make a separate mmapped file per hash item. If the keys form a fixed set, you don't even actually have to share the list of names.
| [reply] |
Re: Share hash across processes
by Moron (Curate) on Jun 11, 2007 at 14:38 UTC
|
The only direct-from-the-OP way I can think of to avoid copying is to create a set of manipulation routines in C, which is capable of directly addressing the shared memory, and call those from Perl to do anything with the data.
One other idea, depending how picky you are about your definition of "shared memory" ... instead of using explicit shared memory, use data segments in a shared library. This will create access to the functions in the library and function names will get copied (e.g. using P5NCI::Library), but provided you use the right C-compiler directives when declaring the data, the operating system (*) will only load one copy of those segments into memory, irrespective of concurrent users.
(update * I say "the operating system" but I take it from various clues that it's Unix of course ;))
__________________________________________________________________________________
^M Free your mind!
| [reply] |