You can do slightly better than what you just suggested for a mergesort of k lists (when k is not super-small). If you have k sorted lists, pop the first item off each and put them into a heap, with some indication of which list they came from. Take off the lowest item off the heap, if it's not a duplicate of the item we last saw, put it into the output list. Pop the list this item came from and put it into the heap. Repeat.

This gives you an O(n log k) mergesort of k lists (where n is the total number of elements in all the lists). Blindly searching through the first elements of each list as you suggest gives an O(nk) algorithm. Of course, when the number of lists is constant, they are both still O(n), but the heap version will have a lower constant (unless k is very small).

What I fail to understand is why you say that perrin's hash algorithm takes O(n log n) time. No, a hash dereferencing operation takes amortized constant time, and you are just doing a dereference for each element in each list. That's a linear time algorithm.

I'm afraid perrin won't be able to better than the linear time algorithm he presents (other than small tweaks of the constant, which he explicitly expressed disinterest in). Think about it, how could you remove duplicates in the general case without at least looking at each element? Yes, there are surely some obscene special cases where you might do better at first, but it's a safe bet that you're going to copy the result into an array somewhere which takes O(n) anyway...

blokhead


In reply to Re^2: better union of sets algorithm? by blokhead
in thread better union of sets algorithm? by perrin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.