in reply to Re^5: If I am tied to a db and I join a thread, program chrashes
in thread If I am tied to a db and I join a thread, program chrashes
I've spent 2 days staring and thinking about that benchmark and it is really hard not to be horribly scathing about it. But I don't want to do that because Coro is a very clever piece of code by an obviously very bright guy.
Still, the best conclusion I can draw from the benchmark is that if you do the same very strange and silly thing with both iThreads and Coro, Coro will do it more quickly. Which is hardly great recommendation. It would be all too easy to draw other, even less flattering conclusions regarding the purpose of the benchmark.
To explain what I mean, lets examine what the benchmark code does. To this end, I'll use this version which just removes all the Coro code so that we can concentrate on what it actually does.
It starts $T threads. We'll come back to these shortly.
It then starts $T more threads, each of which goes into an *endless loop*:
Yes! I was baffled by that construct, but if you run perl -E"while(){ say 'Hi' }" it's and endless loop.while() {
each iteration of which constructs two $N x $N matrices (he uses floats but no matter). So for $N = 2 they might look something like this:
@a = [ [ 1, 2 ], [ 3, 4 ] ] @b = [ [ 5, 6 ], [ 7, 8 ] ]
It then pushes (*shared*) arrays containing *copies* of each combination of pairs of rows from these two matrixs, plus their original positions, onto a work queue.
$Q = [ 0, 0, [ 1, 2 ], [ 5, 6 ] ] [ 0, 1, [ 1, 2 ], [ 7, 8 ] ] [ 1, 0, [ 3, 4 ], [ 5, 6 ] ] [ 1, 1, [ 3, 4 ], [ 7, 8 ] ]
So, back to those first 4 threads he started.
They sit in governed, but also *endless* loops, reading from the work queue.
When the get one of those Q elements above, they unpack the elements into local variables, and then copy the contents of both the subarrays (which are shared, but only ever processed by a single thread!), into *local (anonymous array!?) copies*
It then iterates the two local subarrays in parallel, summing their products. It then construct a local unshared anonymous array containing the x & y from the original input, plus the sum of products. It then shares that anonymous array (which empties them!), before pushing it back on to a results queue.
Which means he is pushing empty shared anonymous arrays onto the results queue?
Now finally, the main queue sits reading from the results queue, ignoring the results but counting. Then after a (arbitrary) count of 63(!?), it starts a timer. Then it continues counting and discarding results until the count satisfies this condition:
and if at that point, at least 5 seconds have elapsed(!?)elsif (($count & 63) == 0) {
if (time > $t + 5) {
it prints out a number that is some manipulation of the time it ran, the size of the matrixs and the count,
printf "%f\n", $count / ($N * $N * (time - $t)); last;
and exits with the 8 threads still running(!?).
None of this makes any sense whatsoever.
At the very least he would have to
As is, he populates a non-shared anonymous array, and then shares it; which wipes the contents, meaning what he pushes into the results queue is an empty array
But those are the least of the problems.
A modern processor can complete the entire 2 * 50x50 multiplication (using the simplest naive algorithm), in much less than a single timeslice.
And far less time than it takes to:
The whole point of shared data is that you can share it. There is no point in making data shared and then copying it to local variables to use it!
To the very best of my abilities to interpret, it is an near to a random value as I can discern.
It is of absolutely no value whatsoever as a benchmark of threading!
I'm glad that you noticed that Coro's main claim--Coro - the only real threads in perl--is bogus, because most people will not. iThreads are "real threads"; and all the more real because they can at least make use of multiple cores. Unlike (any flavour) of user-space cooperative threading.
Just like most of them will take the benchmark results at face value, without ever looking closely enough to see that they are equally bogus.If you break a piece of data into iddy-biddy pieces, subject it to a bunch of unnecessary copying; and then farm it off to threads for manipulations that take far less than a single timeslice to process before copying them some more before queing them back to the main thread to re-assemble--whilst all the time continuing to fire more and more data onto the queues, that you are never going to process, but that will cause lock states that will slow everything down--then it will take longer than if you just used a single process. But that's not news!
It is perfectly possible to utilise multiple cores, via multiple threads, to improve the performance of matrix multiplication--provided the matrix involved are sufficiently large to actually warrent it.
But this is not the way to do it.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^7: If I am tied to a db and I join a thread, program chrashes
by jethro (Monsignor) on Jun 11, 2009 at 02:47 UTC | |
by BrowserUk (Patriarch) on Jun 11, 2009 at 09:58 UTC | |
by marioroy (Prior) on Feb 18, 2013 at 23:31 UTC | |
by jethro (Monsignor) on Jun 11, 2009 at 12:33 UTC | |
by BrowserUk (Patriarch) on Jun 11, 2009 at 14:29 UTC | |
by jethro (Monsignor) on Jun 12, 2009 at 02:12 UTC | |
|