in reply to Re^4: Multi-threads newbie questions
in thread Multi-threads newbie questions

Why doesn't this work?

Really? Works for me:

#! perl -slw use strict; use threads; use threads::shared; use Thread::Queue; use Data::Dump qw[ pp ]; sub helper { my $Q = shift; while( my $ref = $Q->dequeue ) {; lock $ref; $ref->{NEW_KEY} = 1; } } sub my_sub { my( $ref, $n ) = @_; my $Q = new Thread::Queue; my @threads = map async( \&helper, $Q ), 1 .. $n; $Q->enqueue( values %{ $ref } ); $Q->enqueue( (undef) x $n ); $_->join for @threads; } my $hoh = { A => { NAME => 'aa' }, B => { NAME => 'bb' }, }; $hoh = shared_clone( $hoh ); pp $hoh; my_sub( $hoh, 2 ); pp $hoh; __END__ C:\test>junk39 { A => { # tied threads::shared::tie NAME => "aa", }, B => { # tied threads::shared::tie NAME => "bb", }, } { A => { # tied threads::shared::tie NAME => "aa", NEW_KEY => 1, }, B => { # tied threads::shared::tie NAME => "bb", NEW_KEY => 1, }, }
Second, if I understand correctly, you start a thread for each element of hoh. There are usually a few hundreds elements there, so I guess it's not as good idea. that's why I originally wanted you use a thread pool, until you advised me otherwise.

I didn't advice you against using a pool of threads. Only against modules that purport to make using a thread pool "simple", in the most complicated (and broken) ways imaginable.

The code above (also posted at 860833) implements a pool of threads. I posted the non-pooled version first, in order to show how simple the transition from a non-pooled to a pooled solution is. How one is a very small extension of the other.

And why I've never written or used a module to do it. It isn't necessary.

Indeed, the philosophy behind most of those modules is utterly wrong. They manage the number of threads according to how much work is in the queue, for each thread!?! Which is a nonsense, because the number of cores doesn't vary. The processing power of the CPU doesn't vary.

So, at exactly the moment when the CPU is already overloaded--as indicated by the fact the the queue is growing faster than the existing pool of threads can keep up with, what do they do? They start another thread!

Which just means that they've stolen a bunch of cycles to start that thread. And now there is one more thread competing for resources, and having to be task switched by the system scheduler. Which simply slows the throughput of all of the threads.

This asinine methodology is aped from "fork pool" modules, which are equally broken for the same reasons.

The above pooling mechanism starts N threads. You the programmer decide how many threads to run by trial and error. For CPU-bound processing, start with N=cores. For IO-bound, try N=cores * 2 or 3 or 4. You'll quickly find a number that works for your application, then make it the default. Move to a different system with more or less cores, adjust it accordingly.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^6: Multi-threads newbie questions
by daverave (Scribe) on Sep 20, 2010 at 14:54 UTC
    Thanks again. This indeed works perfectly.
    I do have one more question. One of the things my 'process_genome' subroutine has to do is to run a few perl scripts (from JBrowse browser package, if it's of interest).

    I did not write those scripts and have no control over them. These scripts require to be run in the "correct" directory (they do stuff in the working directory rather then ask a directory path as an argument). So. my 'process_genome' has to 'chdir' to the genome directory. However, as I learned the current directory is shared by all threads, I was suggested to write the following wrapper:

    # forks then changes working directory and executes a command sub chdir_and_system { my ( $dir, @exec_args ) = @_; my $pid = fork; die "couldn't fork: $!" unless defined $pid; unless ($pid) { # this is the child chdir($dir); # open STDOUT, ">/dev/null"; # silence subprocess output # open STDERR, ">/dev/null"; # silence subprocess stderr system(@exec_args) == 0 or die "system @exec_args failed"; } wait; die "child error $?" if ($?); }
    then call each of the external scripts using something like this:
    sub add_gff_track { my ( $browser_project_dir, $gff_file, $track_label, $key, @optiona +l_args ) = @_; my @args = ( "bin/flatfile-to-json.pl", "--gff=$gff_file", "--tracklabel=$track_label", "--key=$key", @optional_args ); chdir_and_system( $browser_project_dir, @args ); }
    I was originally suggested to use 'exec' in the wrapper, but after some reading I thought that perhaps 'system' should be used. I'm not sure. the thing is I need to know that when this wrapper sub ends, the system command also ended (other command might rely on its output). I'm also not sure if all the wrapper logic is OK. Bottom line, my threads terminate abnormally. Perhaps the JBrowse scripts don't return a good value even if they succeed?

    How would you go about this?

      How would you go about this?

      I would avoid fork and threads in the same program--but then on my platform I avoid fork at almost all costs.

      On my platform (windows), each thread has it's own current working directory:

      c:\test>perl -Mthreads -le"@t = map{ async( sub{ chdir shift;sleep 3;print`cd`} ,$_) } qw[c:\\ c:\\test c:\\test\\_Inline];$_->join for @t" c:\ c:\test c:\test\_Inline

      But I seem to remember that doesn't hold true for other platforms.

      In any case, I'd simply do:

      system "cd $path && theProg @args";

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I finally went with my $out = capture ([0], "cd $path ; theProg @args");

        the forking gave me so much problems... I have no idea why I started using it.