Using kernel-space threads with Perl

aberman has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Using kernel-space threads with Perl by BrowserUk (Patriarch) on Mar 21, 2011 at 23:48 UTC
Is this possible in Perl, or is there a strategy I can use that won't involve copying the entire contents of memory into each iThread? Unfortunately, there is no way to do this efficiently in Perl if every thread needs to be able to access all the data. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: Using kernel-space threads with Perl by aberman (Initiate) on Mar 21, 2011 at 23:54 UTC
Each thread does not necessarily need to see all the data. The job is embarrassingly parallelizable, so I was going to simply carve it up into pieces. The problem was getting each of the pieces into the threads. If I split up the data before hand, then all of it still gets copied into the threads. Maybe I'm using the wrong strategy? Thanks!	[reply]
Re^3: Using kernel-space threads with Perl by BrowserUk (Patriarch) on Mar 22, 2011 at 00:19 UTC
The job is embarrassingly parallelizable, so I was going to simply carve it up into pieces. Then it may be possible to do something useful. It depends on wher you are getting the data from and how it can be subdivided. A little more information about the data and the processing to be performed on that data might suggest a better technique. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Using kernel-space threads with Perl (order) by tye (Sage) on Mar 22, 2011 at 00:44 UTC
Create the threads first and then have each thread load just the data it needs (and don't share it, of course). Then there won't be extra copies of that stuff created. - tye	[reply]
Re^4: Using kernel-space threads with Perl (order) by BrowserUk (Patriarch) on Mar 22, 2011 at 01:38 UTC
Re^5: Using kernel-space threads with Perl (order) by tye (Sage) on Mar 22, 2011 at 02:18 UTC
Some notes below your chosen depth have not been shown here
Re: Using kernel-space threads with Perl by hermida (Scribe) on Mar 22, 2011 at 13:40 UTC
If the data that needs to be shared is too large for RAM another good option is to use a fast DBM like KyotoCabinet and in your Perl program just use forks with a library like Parallel::Forker, Parallel::ForkManager, Parallel::Prefork, etc. I've done things this way and it works really fast as long as you aren't constantly writing to/changing the shared data (with KyotoCabinet you can specify the size of the RAM cache but the rest of the DBM is on the filesystem will be slower). If you are creating the shared data structure once and then reading from it with your forks it can be nearly as fast as RAM.	[reply]
Re: Using kernel-space threads with Perl by JavaFan (Canon) on Mar 22, 2011 at 01:40 UTC
You may also consider using forks. Modern OSses give the appearance all the data is copied, but they'll implement it using COW, so it reality, data is only copied when it's rewritten in a process. Of course, as pointed out elsewhere in the thread, first creating threads (or processes) and only then reading in the data wins.	[reply]
Re^2: Using kernel-space threads with Perl by BrowserUk (Patriarch) on Mar 22, 2011 at 01:47 UTC
data is only copied when it's rewritten Or treated as a number or ... Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Using kernel-space threads with Perl (COWwardly lyin') by tye (Sage) on Mar 22, 2011 at 03:05 UTC
But I suspect the real killer is when just inc/decrementing a ref count causes an entire page of memory to no longer be shared. But, despite this, I have seen some evidence of caches of Perl data staying partially shared, despite my expectation that this is way too easy to thwart. The case of each child not even looking at most of the data does make the odds improve some. If I wanted to share lots of data between Perl child processes, I'd probably at least consider storing that data via Judy. - tye	[reply]
Re: Using kernel-space threads with Perl by zentara (Cardinal) on Mar 22, 2011 at 17:16 UTC
or is there a strategy I can use that won't involve copying the entire contents of memory into each iThread? Sure. You can create a shared memory segment, and have your threads or forks read from the shmem. See SysV shared memory --pure perl for a rudimentary example Or, you could create a 3.5 gig ramdisk, and have your workers read from it. I'm not really a human, but I play one on earth. Old Perl Programmer Haiku ................... flash japh	[reply]
Re^2: Using kernel-space threads with Perl by aberman (Initiate) on Mar 22, 2011 at 20:26 UTC
This is all very good information (as expected). This is my first real try at parallelizing a large problem, so I'm sure I've made some mistakes with it. I'll try to implement some of these ideas later this week and get back to the thread on it. I'll also post any resulting (working) code at the end to help out others with this problem. Thanks all!	[reply]