RE: diff (different enough)

I don't know if it would help any, but if you have to diff a million strings, why not split um up into subgroups and fork a process to diff each subgroup (if you aren't doing that already)... I mean if you're forced to brute force something, why not do it the right way, fork fork fork! :) my humble opinion :) - Magius_AR

Comment on RE: diff (different enough)

Replies are listed 'Best First'.
(chromatic) RE: RE: diff (different enough) by chromatic (Archbishop) on Jul 22, 2000 at 01:59 UTC
At some point, the divide and conquer approach has so much overhead that the simple solutions are better. fork() is especially expensive, as once you get a few processes out there, context switching between them will take more time than plodding along in a for loop (which will probably optimize to a handful of machine code expressions anyway). I really like the xor solution someone else pointed out, and you could probably do the 'x characters at a time' on it, too.	[reply]
RE: (chromatic) RE: RE: diff (different enough) by jettero (Monsignor) on Jul 22, 2000 at 16:36 UTC
We also considered splitting it up. We couldn't reason out the interprocess communication properly. In actual use, this algorithm get's run every time we insert a new string. So we couldn't figure out how to reliably make sure the kids were all working on mdifferent strings--I mean, we coulda ... we just thought it was a dead end. Unless there _a lot_ of kids (on lotsa computers), it just didn't sound useful.	[reply]