in reply to Re^3: Sharing Hash Question
in thread Sharing Hash Question

I apologize about my original post's vagueness. I have never posted on here before and had forgotten about the tags that you can use.

Understood.

Has your program sped up your processing even slightly?

I'll assume the answer is no. There are several overlapping reasons for why that must be the answer.

The first is this:

for(my $i=0; $i<MAX_THREADS; $i++) { threads->create( \&thread, $q )->join; }

The effect of creating many threads in a loop, but also waiting inside that loop for each one to finish (join()), before starting the next, is exactly the same as if you just called the subroutine many times one after the other.

Ie. The code above is exactly the same as doing:

thread( $q ); thread( $q ); thread( $q ); thread( $q ); thread( $q ); thread( $q ); thread( $q ); thread( $q ); thread( $q ); thread( $q );

Except that in addition to not speeding things up, you made them take considerably longer because you added the additional overhead of starting 10 threads and of locking and manipulating shared hashes.

You can correct that by starting all the threads in the loop; and then waiting for them all to finish,

after the loop so they can run concurrently:

my @threads = map threads->create( \&thread, $q ), 1 .. MAX_THREADS; $_->join for @threads;

This will run more quickly than your code above, but still not faster than a single-threaded process doing the same work.

When you've convinced yourself that is true, come back and I'll explain why and what you can do about it.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^5: Sharing Hash Question
by jmmach80 (Initiate) on Jul 05, 2012 at 22:33 UTC
    Yeah I didn't mean to do the join() like that. The actual script does it like you described. I just had to throw this together primarily for the post.

      And have you compared the performance of the hash population parts of the single-threaded and multi-threaded versions?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        When I get back to work on Monday, I plan on updating the actual script. I'm dealing with files that have 30-45 million rows. I'm hoping spawning 10+ threads (on a 15 plus CPU machine) that are solely parsing the files should help reduce runtime; rather than sequentially working on each file one-by-one.