OK, another try with

tags as suggested:-) Hi all,

Sorry if this is a dump/redundant question but I couldn't find a definitive answer anywhere so I decided to ask for collective wisdom.

I am working on data classification and machine learning project. So I have a large data set that I need to process. The job will run on multi-core CPU and since the data set items are independent the set can be split for processing into multiple units in order to take advantage of the multicore CPU.

Obviously the first things that come to mind are threads and forking. I wrote a version based on forking and it works fine but it is a RAM hog b/c when you fork, every new process is a copy of the parent and the parent in my case is quite large b/c it loads an AI model that consumes about 1GB of RAM. So each child becomes a 1GB monster and I run the risk of either thrashing the swap, which kills performance or running out of RAM altogether if another process kicks in somehow.

With threads it seems that it would be easier since threads have access to global variables defined in the parent, so all spawned threads would share the same AI model and I won't have multiple 1GB copies of the parent. In the thread case I obviously have to worry about locking but that's not an issue as I can implement it. The bigger issue is that it seems that Perl threads live INSIDE the spawning process, so they don't get scheduled on separate CPUs but simply compete for run time within the spawning process. I tried some tests and indeed on Linux the "top" command shows only one Perl process running on one of the 8 available CPUs even though I have 8 threads running. So with threads I am not achieving any speedup on multi-core CPUs.

Does anybody know if Perl supports kernel threads that the OS can then schedule on multiple CPUs? I read the Perl thread tutorial and all it says is that each thread loads a new Perl interpreter. But from what I see that doesn't result in a new runnable object separate from the spawning process that can be scheduled to run on a CPU other than the one used by the spawning process. That said, are there any modules on CPAN that provide through parallelization of tasks so that Perl can take advantage of multiple CPUs? Any help is appreciated. Thanks.


In reply to Perl Threads and multi-core CPUs by haidut

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.