in reply to Taking advantage of dual processors [tag://parallel,cygwin,multi-core,fork,i/o]

This should work anywhere including cygwin:

use threads; use Threads::Queue; my $Q = new Thread::Queue; my $thread = async { ## create/open DB here my $dbs = ...; while( my $query = $Q->dequeue ) { $dbs->query( $query ); } ## finish with DB here }; while (<I>) { my $line = $_; my @bucket = $bucket->based_on($line); # Data::Bucket my @acct_vals; for (@bucket) { # %hash = calculate based on input and bucket value push @acct_vals, \%hash; } next unless @acct_vals; @acct_vals = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, $_->{dist} ] } @acct_vals; my @dump ; for my $acct_val (@acct_vals) { last if $acct_val->{dist} > $match_threshold * 1.5 ; push @dump, $acct_val; } my $d = Data::Dumper->new([\@dump], ['dump']) ; $d->Purity(1)->Terse(1)->Deepcopy(1); ## Build sql and post to db thread (Could be done better) my $args = "'" . join( "','", $acct_name, $clean_acct_name, $d->Du +mp) . "'"; sleep 1 while $Q->pending > 10; ## Adjust threshhold to suit. $Q-enqueue( sprintf 'INSERT INTO addresses VALUES (%s)', $args ); unless (++$counter % $report_very) { warn "$counter in $now"; # Time::Lapse by Scott Godin } } $Q->enqueue( undef ); ## Terminate db thread $thread->join; ## And wait for it to finish.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: Taking advantage of dual processors (threads)
by metaperl (Curate) on Nov 19, 2007 at 21:23 UTC
    Wait a minute, Threads::Queue has been backpanned. It's not on CPAN.

    Also, it seems that all solutions are suggesting lightweight processes. But I have 2 cpus, why shouldnt I be going for 1 heavyweight process per CPU, one for the file reading and writing and the other for a database commit.

    I have beheld the tarball of 22.1 on ftp.gnu.org with my own eyes. How can you say that there is no God in the Church of Emacs? -- David Kastrup
      Threads::Queue has been backpanned.

      Sorry, typo. It should be Thread::Queue per the usage in my example:

      my $Q = new Thread::Queue;

      (and no. You don't have to use indirect object syntax :)

      Also, it seems that all solutions are suggesting lightweight processes. But I have 2 cpus, why shouldnt I be going for 1 heavyweight process per CPU, one for the file reading and writing and the other for a database commit.

      No reason except that then you'll need to communicate between processes. IPC is mostly not portable. Mostly inefficient. Mostly a pain in the tush.

      You need a communications channel: pipes, sockets, or shared memory.

      You need a communications protocol or interprocess semaphores.

      But mostly, you just moved the problem. Now you have to block on reads and writes from your IPC channel.

      Thread::Queue is available, tested and works well.

      You're also barking up the wrong tree thinking of it as one process per cpu. Each process (or thread) will get run on which ever cpu comes free first when it is next ready to run. both processes or threads could end up always running on the same cpu because some other process in your system is hogging one of them.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.