in reply to Re^3: better union of sets algorithm?
in thread better union of sets algorithm?

I'm not convinced. Your numbers, certainly not your insertion numbers, don't look like O(n lg n) to me. But your benchmark has some flaws as well. First, it's using wall-clock times, so anything else on the machine running skews the results. Second, the key size does have some influence on the insert/query times. The average key length of keys 1 .. 100_000 is 4.89, but for keys 1 .. 3_200_000, the average key length is 6.65.

Here's a benchmark that uses the same key lengths, and that's using times instead of time. I also stopped after 1_500_000 keys, to avoid having to swap skew the results.

my @sizes = map {$_* 100_000} 1 .. 15; foreach my $size (@sizes) { my %h; my $from = sprintf "%010d", 1; my $to = sprintf "%010d", $size; my @start = times; foreach my $k ($from .. $to) { $h{$k} = $k; } my @end = times; my $diff = ($end[0] + $end[1]) - ($start[0] + $start[1]); my $ratio = $size / $diff; printf "%10d: %.3f inserts/sec;", $size, $ratio; @start = times; foreach my $k ($from .. $to) { $k == $h{$k} or die; } @end = times; $diff = ($end[0] + $end[1]) - ($start[0] + $start[1]); $ratio = $size / $diff; printf " %.3f queries/sec\n", $ratio; } __END__ 100000: 357142.857 inserts/sec; 625000.000 queries/sec 200000: 392156.863 inserts/sec; 526315.789 queries/sec 300000: 291262.136 inserts/sec; 434782.609 queries/sec 400000: 476190.476 inserts/sec; 459770.115 queries/sec 500000: 458715.596 inserts/sec; 442477.876 queries/sec 600000: 337078.652 inserts/sec; 447761.194 queries/sec 700000: 457516.340 inserts/sec; 429447.853 queries/sec 800000: 441988.950 inserts/sec; 425531.915 queries/sec 900000: 434782.609 inserts/sec; 410958.904 queries/sec 1000000: 427350.427 inserts/sec; 404858.300 queries/sec 1100000: 317002.882 inserts/sec; 416666.667 queries/sec 1200000: 433212.996 inserts/sec; 408163.265 queries/sec 1300000: 420711.974 inserts/sec; 398773.006 queries/sec 1400000: 425531.915 inserts/sec; 402298.851 queries/sec 1500000: 394736.842 inserts/sec; 364963.504 queries/sec
Not something I'd call O(n lg n).