in reply to Re^2: How to use my two processors
in thread Need a faster way to find matches

If you ran that on a 2-core system, I am pleasantly surprised that you continued to make appreciable gains with more than 2 threads. Very surprised that they were still relatively worthwhile for 3 and 4--are you sure you don't have 2 cpus each with 2 hyper-threads?

I've little experience of POE. Historically, I've been wary of the overhead of IPC via *nix pipes, but it was demonstrated to me a year or two ago that things have moved on markedly since my days with HPUX-10, so it may work for you.

As for pipelined threads: the problem is that unless the producer thread produces a steady and relatively high stream of output, the consumer thread tends to get a timeslice, wake up, process one item and then have to relinquish the rest of its timeslice because there's no more work waiting for it. That tends to lead to a low ratio of work done to context switch. That's why I favour the partitioning approach.

Pipelined processes tend to alleviate that problem by buffering the pipe, thereby deferring the need to switch contexts until there is enough work for the consumer to make the context switch worthwhile. You can emulate that using threads by batching up the producer's output, but it tend to lead to code that needs to be re-tuned manually for each different hardware setup--you initially tune it for 2 cpus and then have to re-tune it if you move it to a 4-core box etc. Even a different workload can screw up the tuning. You set it up for a development box, but when you move it to an identical production box, you have to re-tune it because the workload mix is different.

I've encountered similar problems with both event-driven and co-routine setups. If you buffer too little, you end up wasting cycles context switching to the consumer too frequently. If you buffer too much, the producer spins cycles waiting for the consumer to catch up.

Good luck. Let us know how you get on. (I'm intrigued by the purpose of this?)


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"I'd rather go naked than blow up my ass"

Replies are listed 'Best First'.
Re^4: How to use my two processors
by remzak (Acolyte) on Jan 19, 2010 at 15:55 UTC
    I completed the project (as complete as it will ever be). I ended up using one child thread with a queue to create two parallel processes. It was exactly what I wanted, and it was extremely simple to implement. The main process created the valid numbers; the child process paired them up. The two ran very much in parallel since the pairing algorithm actually ran more efficiently working on a growing list of numbers. If I had more processors, I'm not sure how I'd partition the logic.

    I think I'd have to re-framed the overall algorithm to take advantage of many processors; then, it may have been more efficient to multi-thread and partition like you suggested.

    The time dropped to 44% of the original (2.25 times faster). Some of the speed came not from the threading, but just learning how to write more efficient perl statements. I learned that there are expensive operations in perl and some very fast operations... I ended up tweaking many, and changing how I was storing the data to take advantage of these realities.

    I am sure the program could be made even faster once I understand perl a bit better.

    Thank you to everyone for your help!