in reply to Algorithm Efficiency vs. Specialized Hardware?

I've done some benchmarking of my own over various dataset sizes.
dataset sizeGrepMaxTernary
100319704.72/s167783.44/s160956.63/s
1000324017.59/s158933.12/s157394.83/s
10000328377.50/s162308.63/s159661.13/s
100000332964.44/s160964.22/s157891.99/s
You will notice that the times for dataset sizes of 100, 1000, 10000 and 100000 are nearly identical for each algorithm. This clearly shows that for datasets of that size it is not the size of the dataset that is the limiting factor in performance. I suspect the preformance issue is "Algorithmic Friction"; the overhead involved in doing the calls themselves. Also remember that the sort is implemented in the C library, so it is really fast. (still O(n log(n)), but a very fast O(n log(n))).

What I expect you're seeing is that it is much quicker to sort (implemented in binary code) and index the last element than it is to iterate through an array (even once) in an interpreted language for datasets of the size we are considering.

You see similar results with sorting algorithms. Quicksort is well know to be the fastest general purpose sorting algorithm for large sets O(n log(n)). However the overhead incurred by quicksort is overkill for small datasets. For datasets with very few elements bubblesort is often faster, even though bubble sort is O(n^2).

  • Comment on Re: Algorithm Efficiency vs. Specialized Hardware?

Replies are listed 'Best First'.
RE: Re: Algorithm Efficiency vs. Specialized Hardware?
by Aighearach (Initiate) on Jun 20, 2000 at 23:25 UTC

    I did a test where I did this, and increasing the %url population from 1000 to 100_000 caused each algorithm to take over 100 times longer.

    the change is:

    old:

    my %url = ( monday => { @{[map(($_,1), (1..1000))]} } );
    new:
    my %url = ( monday => { @{[map(($_,1), (1..100_000))]} } );

    I turned the iterations down for the large dataset, because otherwise I would be posting tomorrow instead of today. ;)
    results are:

    Large dataset
    Benchmark: timing 100 iterations of Grep, Max, Ternary...
          Grep: 83 wallclock secs (80.90 usr +  0.02 sys = 80.92 CPU) @  1.24/s (n=100)
           Max: 74 wallclock secs (71.38 usr +  0.00 sys = 71.38 CPU) @  1.40/s (n=100)
       Ternary: 77 wallclock secs (74.15 usr +  0.00 sys = 74.15 CPU) @  1.35/s (n=100)
    
    Small dataset
    Benchmark: timing 10000 iterations of Grep, Max, Ternary...
          Grep: 73 wallclock secs (68.50 usr +  0.05 sys = 68.55 CPU) @ 145.88/s (n=10000)
           Max: 61 wallclock secs (57.77 usr +  0.05 sys = 57.82 CPU) @ 172.95/s (n=10000)
       Ternary: 64 wallclock secs (61.54 usr +  0.02 sys = 61.56 CPU) @ 162.44/s (n=10000)
    

    Can you post the rest of the Benchmark output?

    Paris Sinclair    |    4a75737420416e6f74686572
    pariss@efn.org    |    205065726c204861636b6572
    I wear my Geek Code on my finger.
    
      Here's the code I used to run my test:
      my @size=(100,1000,10000,100000); foreach(@size){ my $size=$_; print "size=$size\n"; my $now = 8; my %url = ( monday => { @{[map(($_,1), (1..$size))]} } ); timethese(0, { Grep => q{ $now = (sort grep {$_ <= $now} keys %{$url{'monday'} +})[-1]; }, Ternary => q{ $now = ($now < $_ && $_ < 8 ? $_ : $now) for keys + %{$url{'monday'}}; }, Max => q{ foreach ( keys %{$url{'monday'}} ) { $now = $_ if $_ +> $now }; } }); }
      I am awful suspicious of my tests taking almost the same amount of time to run no matter how many elements there were in the dataset.

      I think I am onto something... I don't think the code posted in teh original question really benchmarks what the user thinks it does... I'm crafing a new reply to the original question that should be up soon.

A reply falls below the community's threshold of quality. You may see it by logging in.