in reply to Re: passing hashes between threads
in thread passing hashes between threads

Thank you for your comprehensive and helpful answers!

I will need to think over it - but some quick remarks:

First some comments on the problem as a whole.
The input contains different record types because there is transaction data and master data. Most work is finding the right data in a transactional record do some (static) recoding and cross referencing via mapping-files. This I do in ParseDok - so "parse" is a little short hand :-)
But some data (the minor part) is depended on previous records. That means in record X a numbering change is announced and has to be applied for all following records.
So the writer thread is not only a writing but maintaining the original order and doing some filtering and code mapping as well. Sorry - I tried to keep my post short.

Push references to already shared hashes.
I tried this - it was slower than the deep copy

Why do you want to queue hashes from one thread to the other in the first place
This was how i did it in the single-threaded version. So my first try was so put the parsing into worker threads and pass back the existing hashes. Now I am working on a new solution. Hence my questions.

Going back to your original application rather than your wholly artificial test code
I did not thought it artificial because it is more or less the isolated code fragment of my thread handling. It is my test/experimental code for trying out new solutions. Sub ParseDok alone is ~1.100 lines of code (shure, not in one function!). I was interested in measuring time differences for passing data between threads, to get a feel for that.

Replies are listed 'Best First'.
Re^3: passing hashes between threads
by BrowserUk (Patriarch) on Sep 18, 2011 at 12:22 UTC
    Push references to already shared hashes. -- I tried this - it was slower than the deep copy

    Hm. Here's my test of several methods of passing hashes between threads:

    And here are the results for 10000 hashes of 100 key/value pairs:

    c:\test>TQ-b -H=200 -N=10000 Unshared hashrefs: 10.857 join/split: 3.121 freeze/thaw: 0.686 Shared hashrefs: 0.265

    Here are the results for 10000 hashes of 1000 key/value pairs:

    c:\test>TQ-b -H=2000 -N=10000 Unshared hashrefs: 117.532 join/split: 30.482 freeze/thaw: 2.886 Shared hashrefs: 0.250

    Please note not just how much faster the latter is, but that it barely changes with hashes of 10 times the size.

    I still think that your design that requires hashes to be shipped from one thread to another is the wrong approach, but you've not supplied enough information to allow me to confirm or deny that.

    not only a writing but maintaining the original order

    Hm. This is very troublesome. Quite how you are "maintaining order" when fanning out records to multiple threads and then gathering them back together is very unclear. Nothing in your posted code, and no mechanism I am aware of of will allow you to do this.

    Threads are non-deterministic. Believing you will read records back from the 'return' queue in the same order as you fed them to the 'work' queue is a very bad assumption.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.