in reply to Re^3: Perl Threads and multi-core CPUs
in thread Perl Threads and multi-core CPUs

If you don't load the data until after the threads are spawned, isn't it the same deal, i.e. you mark it shared or else it's per-thread? I don't see the advantage in waiting, unless you know you only need the data in some of your threads.

If you have readonly data that is needed by many/all threads, mark it shared and you only get one copy (plus a few shared references).

If the data is globally modifyiable, mark it shared and have one copy.

If the data is locally modifiable, you need to have 1 copy per thread and it is far faster to clone it on mass than piecemeal via COW.

It would be great if threads could take advantage of COW, but at the moment they don't.

It doesn't make sense to use COW with threads, they can already share a single copy in memory.

COW only 'saves' if individual threads need to be able to modify local copies of small portions of large datasets. In this rare case, it is easy enough to a) share the large dataset (having set it readonly) and then b) copy the required portion to a thread-local non-shared storage during the process of modification.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^5: Perl Threads and multi-core CPUs
by perrin (Chancellor) on Sep 10, 2008 at 12:10 UTC
    The mechanism for sharing, threads::shared, doesn't look very practical for sharing a complex 1GB AI model. It looks like the code constructing the model would have to be modified a fair amount to construct things in the right order and mark every variable shared. Maybe you know a higher-level way to do this though.

    For read-write data, I do think threads would use less memory if the vast majority of the program is the shared AI model. For read-only data, COW will do pretty well at sharing with no changes to the code and will share things that are outside of the AI model as well if they don't get modified.

    COW with threads would help with all the variables that you don't have an easy way to mark shared but don't need separate copies of, e.g. all of the data used by all of the CPAN modules you load.

      The mechanism for sharing, threads::shared, doesn't look very practical for sharing a complex 1GB AI model. It looks like the code constructing the model would have to be modified a fair amount to construct things in the right order and mark every variable shared. Maybe you know a higher-level way to do this though.

      A lot depends on how that 1GB AI model is structured and used.

      If the vast majority of that 1GB is just structured data, then it isn't hard to build it as a shared data structure as the data is read from disc. Yes, it would mean modifying the existing module that loads the data, but that's no different to having to modify the loading routines if you want to use (say) numeric data with PDL.

      threads::shared is specifically designed so that it silently does nothing if threads isn't loaded first, which means that you can theoretically mark data :shared internally to a module and still use it in a non-shared environment. If the program that loads the module doesn't use threads prior to loading the data module, the :shared annotations do nothing. I'll admit that this facility isn't particularly useful though.

      IMO, the single biggest flaw in the iThreads design is that it attempts to make using threads transparent. Nice idea, and for the simplest of uses it is even quite successful, but it comes at a cost. That of making the less than simple uses much harder than they need to be.

      There is a distinct lack of any real information in the OP about the nature of that 1GB AI. I've been waiting for some feedback from the OP, but as with so many of the recent "threads" threads, they seem to be posted more to instigate controversy than garner any real information. The missing information includes:

      • Is the AI in question used simply as reference for the processing to be performed, or is it modified by the processing performed?
      • Is it structured as data structure(s) + procedures; or blessed objects?
      • How much of the processing involves interaction with the AI?

      And in essense that is all I've been trying to say in this thread. With the sparcity of available information, it isn't possible to determine whether this would be better written as threaded or forked process.

      If the AI is modified as the data is processed to reflect the state of that data, then threads are the easiest option. (Versus SysV shared memory of multiplexed pipe/socket communications).

      If the AI is static for the processing, then forks and COW may be the most effective solution. Though as noted earlier, for best effect, the building of the AI dataset should ensure that: numeric data is stored numerically at load; and set readonly to avoid piecemeal copying.

      The mechanism for sharing, threads::shared, doesn't look very practical for sharing a complex 1GB AI model. It looks like the code constructing the model would have to be modified a fair amount to construct things in the right order and mark every variable shared. Maybe you know a higher-level way to do this though.

      Knowing the world you work in, I'll simply say that mod_perl and/or FastCGI and threads don't mix. The whole thing of mod_perl is that you pre-load anything and everything that your application might use, up front, and then spawn&discard to process each connection. Those two things combine to be the very worst possible way to use iThreads.

      The only sensible way to use iThreads for a web app would be to start a pool of threads for each mode/state of the application, that each load only those modules and data they require for their processing, and then queue in-bound requests for each state to the appropriate pool. But I know of no web server environment that allows that level of control of the perl environment. In a nutshell, don't use threads for web apps.

      There are some cases for small scale asynchronous processing for which starting a short lived thread might be useful in a web app environment, but mostly not. With FastCGI, spawning a thread to carry out a long-lived query can be an effective alternative to forking, but again it depends upon the web server and the level of control you have.

      The biggest bug bear I have, is that people insist on a black & white forks or threads approach to everything. Some things work best with forks (if they are available). Other things work better with threads. All to often opinions are expressed as to which is best. on the basis of personal preference, long before enough information is forthcoming to make an informed decision as to which would be best for the particular application.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.