in reply to Could there be ThreadedMapReduce (and/or ForkedMapReduce) instead of DistributedMapReduce?
Now, I don't have 80,000 machines.But I do have a single machine that can run multiple processes.
But I hate writing code that does this, because threads are painful, forks are painful, you get race conditions, you have to use locks...
Aside from the pains you cite, there is also the inescapable truth that if you try to do more with just the one machine, you will hit a limit -- a plateau -- beyond which further "parallelizing" will hurt rather than help.
Whether your task is mainly i/o bound, or memory bound, or cpu bound, adding more instances of the task will, at some point, exacerbate the load on the given resource to the point where improvements are not only impossible, but negated.
Maybe running three instances in parallel will be faster, overall, than running two-at-once then one, but maybe running four at once will be slower than running two parallelized pairs in sequence. YMMV.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Could there be ThreadedMapReduce (and/or ForkedMapReduce) instead of DistributedMapReduce?
by tphyahoo (Vicar) on Oct 20, 2006 at 17:51 UTC |