in reply to diff (different enough)

I don't know if it would help any, but if you have to diff a million strings, why not split um up into subgroups and fork a process to diff each subgroup (if you aren't doing that already)... I mean if you're forced to brute force something, why not do it the right way, fork fork fork! :) my humble opinion :) - Magius_AR

Replies are listed 'Best First'.
(chromatic) RE: RE: diff (different enough)
by chromatic (Archbishop) on Jul 22, 2000 at 01:59 UTC
    At some point, the divide and conquer approach has so much overhead that the simple solutions are better. fork() is especially expensive, as once you get a few processes out there, context switching between them will take more time than plodding along in a for loop (which will probably optimize to a handful of machine code expressions anyway).

    I really like the xor solution someone else pointed out, and you could probably do the 'x characters at a time' on it, too.

      We also considered splitting it up. We couldn't reason out the interprocess communication properly. In actual use, this algorithm get's run every time we insert a new string. So we couldn't figure out how to reliably make sure the kids were all working on mdifferent strings--I mean, we coulda ... we just thought it was a dead end. Unless there _a lot_ of kids (on lotsa computers), it just didn't sound useful.