in reply to Multi-thread combining the results together

I usually start with this: How to create thread pool of ithreads

I have also asked a question about sharing complex/nested data between threads here ithreads, locks, shared data: is that OK?.

You seem to need to share only an array and hash of scalars, which is so much simpler and faster for you and can be done via Thread::Queue (edit: see also: Re: Multi-thread combining the results together) for the array and see below for the hash. You may be tempted to share a hash between threads to store the %result. I would say don't in this case, because data will be duplicated in each thread. It is not clear to me if using threads::shared will actually share a reference with locking and semaphors or it does a transparent and sophisticated data duplication behind the scenes, from manual:

By default, variables are private to each thread, and each newly created thread gets a private copy of each existing variable. This module allows you to share variables across different threads

In order to avoid sharing a hash I use this trick: push into the done queue a string like "$token=[@line_results]". When all threads are joined I convert the strings to the results hash.

It will make a huge difference in performance if you minimise your reads/writes to the Queue by, for example, read (edit: and dequeue!) all the data you will need for that particular thread once at the beginning instead of in a loop. Do processing and write results to a temporary thread-private variable. Write that variable in one go to the Queue when done in order to eliminate the locking and unlocking each time you write to the Queue...

So, reducing your running time proportionally to the number of threads is a holy grail as there are data read/write costs. Which proves that parallelism can sometimes be worse for performance! Ah the eternal battle between cooks and romantics ("too many cooks in the kitchen" vs "many hands make light work"). Aim to share as little as possible...

Of course nothing stops you from requesting a memory segment shareable to all processes/threads via IPC::Shareable. There you can de-/serialise any complex data structure to be shared but you will need to implement your own locking. Recent article with some code: Re: IPC::Shareable sometimes leaks memory segments

bw, bliako

Replies are listed 'Best First'.
Re^2: Multi-thread combining the results together
by Marshall (Canon) on Jul 25, 2019 at 10:34 UTC
    Thanks for the input! You, Grandfather and 1nickt have given some ideas to work on.

    My single threaded code uses a hash for the output, but I don't need to do that. Each thread can push a ref to Array (a row) onto a common output queue and I can deal with that after everybody is finished. Converting 80K rows to a hash or sorting this is a "no brainer" compared with the time it takes to run the regex.

    I wrote the build_regex() function back in 2007 and I'm at the point where what was easily fast enough a decade ago no longer is. I will be rethinking the algorithm, but if I can "juice this baby up by a factor of 3-4", that will give me enough time to ponder a new approach to the problem.

      Also, I found this very useful Threads From Hell #1: How To Share A Hash [SOLVED].

      This is far-fetched but in case you want to run a server (as a separate script) which provides data to workers (separate single-thread processes) then this can get you started: Re: Disc burning. Good for allowing for DB access and distributing over a cluster - a grand design for sure.