I'm not convinced. Your numbers, certainly not your insertion numbers, don't look like O(n lg n) to me. But your benchmark has some flaws as well. First, it's using wall-clock times, so anything else on the machine running skews the results. Second, the key size does have some influence on the insert/query times. The average key length of keys 1 .. 100_000 is 4.89, but for keys 1 .. 3_200_000, the average key length is 6.65.

Here's a benchmark that uses the same key lengths, and that's using times instead of time. I also stopped after 1_500_000 keys, to avoid having to swap skew the results.

my @sizes = map {$_* 100_000} 1 .. 15; foreach my $size (@sizes) { my %h; my $from = sprintf "%010d", 1; my $to = sprintf "%010d", $size; my @start = times; foreach my $k ($from .. $to) { $h{$k} = $k; } my @end = times; my $diff = ($end[0] + $end[1]) - ($start[0] + $start[1]); my $ratio = $size / $diff; printf "%10d: %.3f inserts/sec;", $size, $ratio; @start = times; foreach my $k ($from .. $to) { $k == $h{$k} or die; } @end = times; $diff = ($end[0] + $end[1]) - ($start[0] + $start[1]); $ratio = $size / $diff; printf " %.3f queries/sec\n", $ratio; } __END__ 100000: 357142.857 inserts/sec; 625000.000 queries/sec 200000: 392156.863 inserts/sec; 526315.789 queries/sec 300000: 291262.136 inserts/sec; 434782.609 queries/sec 400000: 476190.476 inserts/sec; 459770.115 queries/sec 500000: 458715.596 inserts/sec; 442477.876 queries/sec 600000: 337078.652 inserts/sec; 447761.194 queries/sec 700000: 457516.340 inserts/sec; 429447.853 queries/sec 800000: 441988.950 inserts/sec; 425531.915 queries/sec 900000: 434782.609 inserts/sec; 410958.904 queries/sec 1000000: 427350.427 inserts/sec; 404858.300 queries/sec 1100000: 317002.882 inserts/sec; 416666.667 queries/sec 1200000: 433212.996 inserts/sec; 408163.265 queries/sec 1300000: 420711.974 inserts/sec; 398773.006 queries/sec 1400000: 425531.915 inserts/sec; 402298.851 queries/sec 1500000: 394736.842 inserts/sec; 364963.504 queries/sec
Not something I'd call O(n lg n).

In reply to Re^4: better union of sets algorithm? by Anonymous Monk
in thread better union of sets algorithm? by perrin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.