Load the 1.5Gig data and then fork. That way, each child will have the data. The data will be copied when a child writes to it, so it's best not to write to that data unless it's absolutely necessary.
| [reply] |
Is there some magic going on behind a curtain, such that parent data isn't really copied until the child alters it?
| [reply] |
I want to have the data in memory only once, but having 4 processor cores working with them. Sounds easy, but I haven't found anything so far...
Getting threads to work with shared read/write memory is surprisingly hard. Ask any C++ programmer. Here's a tutorial on how to do it in Perl; it definitely exceeds the scope of a messageboard post. I have never uses Perl threads myself; back when I thought I needed them, Perl threads weren't stable yet, and I haven't actually needed them ever.
In a nutshell, life gets easier when you can use the data 100% read-only. Then you're not limited to threads as the only "easy" solution. You can just as easily whip up a forking version with parent-child communication or even a network server for the data, to which several lightweight forked processes connect. | [reply] |
Yeah, Perl threads are only useful if you need to share data in realtime between threads, even then then they are slower than forking and using shared memory IPC. BUT, when the data shared is minimal, threads are easier to setup and deal with. Pure C threads work so much better, Perl's handling of threads adds alot of complications. For instance, in C a Perl shared::variable is just a global variable, and memory gets returned to the system when a thread is joined. Threads in Perl can give you the wrong impression of how well threads work in C.... see gdk-threads
| [reply] |
You probably want to use load and then fork strategy. If you have too much writing to do for that to work (the sharing of copy-on-write does not pick up changes in one child), either use a traditional database or something like BerkeleyDB. BerkeleyDB shares the data in memory and does not require remote calls, so it's faster than any RDBMS and allows read/write on the shared data. | [reply] |