in reply to Re^4: shared scalar freed early (queue)
in thread shared scalar freed early

Oh, so it is "slower" because you are doing all kinds of blocking between threads (not because it is more expensive). Don't do that.

If you need to preserve order, than create slots for the data and let the threads fill in the slots at their leisure. Way, way simpler code. I failed at trying to understand what the point of your overly complicated sample code was. Sorry.

Don't pass thread handles through queues. Pass work items. If you only have as many threads as you need, then you don't need the semaphore at all.

- tye        

Replies are listed 'Best First'.
Re^6: shared scalar freed early (block)
by chris212 (Scribe) on Feb 24, 2017 at 15:40 UTC

    It will only block between threads with testa (my approach) if the next thread to output is not the first one finished. As soon as the next thread to output is finished, a thread is created for the next chunk to be processed. This can probably even be improved if I up the semaphore at the end of the worker thread instead of after the next one to output is joined.

    With testb (your approach), the semaphore is up'ed as soon as any chunk of data finishes processing allowing another chunk to be queued and processed. That should theoretically be more efficient, but it isn't. It is 5x slower. Even with ikegami's clean and elegant solutions using your approach, it is till 5x slower. I suspect it has to do with how memory is managed passing data structures to threads as opposed to making it shared in a queue.

      With testb (your approach), the semaphore is up'ed

      No, with my approach, there is no semaphore.

      Create N threads. Have them read work items from an input queue. Create an output thread that reads items from an output queue. Now feed input to the input queue. Create a work item by sharing a new HASH or ARRAY ref. Store a sequence number into the work item. Store what the worker needs to know into the work item. Add the work item to the input queue. When a worker thread pulls a work item, it replaces what only it needs to know with what it computes and then puts the result (still containing a sequence number) into the output queue.

      The output thread just pulls stuff out of the output queue. It starts off knowing nothing other than the first sequence number. When it pulls a work item, if that work item's sequence number matches the next sequence number, then it can output it immediately. If not, it stores it into a not-shared hash with the sequence number as the key. Every time it outputs something it also increments the sequence number and looks the result up in its local hash. If that finds a match, then it deletes it from the hash and outputs it (which triggers a repeat of this process).

      If I were writing it, I'd probably at least look at having the main thread handle both input and output. Then it could track (with a simple non-shared counter) how many work items are potentially in the combined queues and avoid letting that build up too high so that the sources of inputs can notice that intake is falling behind instead of just having RAM usage grow unbounded with no other signs of problems.

      - tye        

        How is that different from ikegami's code which uses an input queue and an output queue with a sequence number? That is still 5x slower.