in reply to Does anyone know about the Multi-CPU Module

You don't tell us why your script is slow. Until you've determined what is holding your script back, there is very little sense in throwing multiple CPUs at it. MCE will likely need a major rewrite of your script to make it work in parallel, and you haven't told us that your script is CPU bound.

If you can compile your own Perl, compiling a Perl without thread support usually gives roughly 10% performance improvement.

Using a different algorithm can speed up your program by large factors.

Caching more data between runs can also speed up your program vastly, at the cost of disk space.

First of all, you have to determine where your program is slow. Devel::NYTProf will likely help you there.

  • Comment on Re: Does anyone know about the Multi-CPU Module

Replies are listed 'Best First'.
Re^2: Does anyone know about the Multi-CPU Module
by Anonymous Monk on Feb 23, 2014 at 00:40 UTC
    Hm, it turned out that I was wrongly using the Switch function instead of the normal if-elsif one... Now the code is something like 80% faster, I guess no multi-threads are needed!
    Thank you!
Re^2: Does anyone know about the Multi-CPU Module
by marioroy (Prior) on Dec 13, 2014 at 06:20 UTC
    Hi Corion,

    MCE requiring a "major" rewrite is a strong phrase. Not all use-cases requires a "major" rewrite. Typically, changes are print to MCE->print or MCE->say. The other having to write an output iterator if wanting to preserve output order which is added code and not so much changing the original code.

    MCE::Loop can wrap serial code quite nicely, only changing a few lines.

    Serial code.

    my @input = 100..200; foreach (@input) { print "$_\n"; };

    MCE code.

    use MCE::Loop max_workers => 8, chunk_size => 1; my @input = 100..200; mce_loop { MCE->print("$_\n"); } @input;

    MCE::Map requires only a single line change. Replacing map with mce_map.

    use MCE::Map; use Time::HiRes 'sleep'; # for simulating work my @a = mce_map { sleep rand; $_ * 1.618 } 1 .. 100; print "@a\n";

      Sure - as long as your code is cleanly structured and basically functional code without side effects, that's no problem.

      As soon as you have global variables and iterate over them or state in the database, this does not apply anymore. For example the following construct will need a major rewrite of the logic because DBI statement handles do not survive fork():

      my $sth_items= $dbh->prepare(<<SQL); select * from myitems where date > '20140101'; SQL while(my $item= $sth_items->fetchrow()) { ... lengthy process for each item ... };

      You cannot easily apply mce_loop or mce_map there, because you don't want to fetch all rows into the process. So here, the program logic will need a major rewrite.

        Corion, that is a great example for demonstrating an input iterator with MCE. A global dbh variable is not necessary. Below, the db_iter closure is called by the manager process each time a worker requests the next input item.

        Pretty much everything is wrapped inside db_iter for db apps. The closure simply returns the next row.

        use MCE::Loop chunk_size => 1, max_workers => 4; use DBI; sub db_iter { my $dbh = DBI->connect("dbi:SQLite:dbname=sample.db", "", "", { PrintError => 0, RaiseError => 1, AutoCommit => 1, FetchHashKeyName => 'NAME_lc', }); my $sth = $dbh->prepare("select fname, lname from people"); $sth->execute(); return sub { if (my @row = $sth->fetchrow_array) { return @row; } return; } } mce_loop { my ($mce, $chunk_ref, $chunk_id) = @_; my ($fname, $lname) = @{ $chunk_ref }; MCE->say("Hello, $fname $lname"); } db_iter();

        See https://metacpan.org/pod/MCE::Core#SYNTAX-for-INPUT_DATA if chunking is desired. The db_iter example, on the page, is written using the core API. Below same example using MCE::Loop instead.

        use MCE::Loop; use DBI; sub db_iter { my $dsn = "DBI:Oracle:host=db_server;port=db_port;sid=db_name"; my $dbh = DBI->connect($dsn, 'db_user', 'db_passwd') || die "Could not connect to database: $DBI::errstr"; my $sth = $dbh->prepare('select color, desc from table'); $sth->execute(); return sub { my $chunk_size = shift; if (my $aref = $sth->fetchall_arrayref(undef, $chunk_size)) { return @{ $aref }; } return; } } ## Let's enumerate column indexes for easy column retrieval. my ($i_color, $i_desc) = (0 .. 1); MCE::Loop::init { max_workers => 3, chunk_size => 1000, input_data => db_iter(), }; mce_loop { my ($mce, $chunk_ref, $chunk_id) = @_; my $ret = ''; foreach my $row (@{ $chunk_ref }) { $ret .= $row->[$i_color] .": ". $row->[$i_desc] ."\n"; } MCE->print($ret); }; MCE::Loop::finish;

        But one thing is missing from the 2 examples above. Perhaps, workers also need to communicate with the DB. The user_begin and user_end options are where workers obtain a db connection and disconnect. The dbh handle is stored in the MCE hash for later retrieval by the loop block. Thus, the db connection is obtained once.

        MCE::Loop::init { max_workers => 3, chunk_size => 1000, input_data => db_iter(), user_begin => sub { my ($mce) = @_; my $dsn = "DBI:Oracle:host=db_server;port=db_port;sid=db_name"; $mce->{dbh} = DBI->connect($dsn, 'db_user', 'db_passwd') || die "Could not connect to database: $DBI::errstr"; }, user_end => sub { my ($mce) = @_; $mce->{dbh}->disconnect; }, }; mce_loop { my ($mce, $chunk_ref, $chunk_id) = @_; my $dbh = $mce->{dbh}; my $ret = ''; foreach my $row (@{ $chunk_ref }) { $ret .= $row->[$i_color] .": ". $row->[$i_desc] ."\n"; } MCE->print($ret); };