in reply to Re: better union of sets algorithm?
in thread better union of sets algorithm?

I had the same thought when reading through Mastering Algorithms with Perl, but I'm actually working with strings here. I can't think of a cheap way to map strings to bits that would work for this. It may be worth trying to keep track of integers for these since I could probably do it cheaply as they are added to my database and then the union could be done this way.

Replies are listed 'Best First'.
Re^3: better union of sets algorithm?
by BrowserUk (Patriarch) on Mar 11, 2005 at 19:03 UTC

    Yes. The mapping is the crux of the issue.

    If your doing the unions (or intersections, sym.diffs), on a regular basis, then it can be worth the effort of building a uniq index (offline) and replacing your sets of strings with bitvectors mapped against that index.

    You then hold and maintain your sets as bitvectors and all the set manipulations become easy and efficient, except adding (and to lesser extent displaying), which requires mapping.

    The index doesn't need to be ordered in any way, just unique. though ordering them does allow for the use of a binary chop for lookups when adding (or displaying).

    Whether the offline work of mapping can be amortised to effect an overall saving depends on how often your sets change and how often you need to do the unions.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.