What came (in) first, the push or the splice?

The other day, a co-worker (ysth) and I were looking at a bit of code. The results of our research turned out to be somewhat interesting and thought I'd share with y'all in case you happen to be interested.

The findings bring up a good programming point (particularly in Perl). Simply, it is very easy to program similar problems the same without looking at the direct context of what you are trying to do. In the rest of this posting, you will see that copying and clearing an array can be done a number of ways, whereas if you are looking for speed (which most of us are), depending on the size and contents of the array, the answer will be different. A summary of my findings are at the very bottom of the post if you don't want to read through all the rest of my words :)

Problem:
Basically, we were looking at a bit where we are trying to transfer one AoH to another (where the source was acting as a buffer). Looking through the history of the file, we noticed that the algorithem has changed over time. I said it was for 'speed' reasons mostly, whereas he recommended a 'faster' way. Not knowing which was truely faster, we decided to run a little benchmarking and here's what we came up with.

Again, the problem was to copy an AoH of n size (where n can range from 0 to a lot more than 0) while cleaning out the source AoH to ensure the buffer didn't sustain any old values. Simple enough. We decided to benchmark four different methods: a for with a </code>push</code>, a simple splice, a push with a re-assignment of an empty list; and a while with a shift.

Here's what we started with (Where we ran Benchmark::cmpthese() for three seconds over each of the following subs):

my @arr = qw( one two three );

sub while_with_shift_push {
        my $ref;
        my @cp = @arr;
        push( @$ref, $_ ) while( $_ = shift( @cp ) );
        return $ref;
}

sub for_with_push {
        my $ref;
        my @cp = @arr;
        push( @$ref, shift( @cp ) ) for ( 0..$#cp );
        return $ref;
}

sub splice_it {
        my $ref;
        my @cp = @arr;
        push( @$ref, splice( @cp ) );
        return $ref;
}

sub push_with_clear {
        my @cp = @arr;
        my $ref;
        push( @$ref, @cp );
        @cp = ();
        return $ref;
}
[download]

You can see we're basically testing four methods of doing the exact same thing to see what's faster.

As originally anticipated, splice() won in true form. Followed closely by push(), then while and for.

                          Rate for_with_push while_with_shift_push pus
+h_with_clear splice_it
for_with_push         160238/s            --                   -9%    
+        -32%      -43%
while_with_shift_push 175370/s            9%                    --    
+        -26%      -37%
push_with_clear       237079/s           48%                   35%    
+          --      -15%
splice_it             280139/s           75%                   60%    
+         18%        --
[download]

Of course, as I noted above, it's not really known what the size of the list will be. Therefore, for the next test, I increased the size of the list we were going to mull over to an array of integers 100000 deep. Not reality (yet), though will give a good example of what I am trying to express in this post. Using the same benchmark, here are the findings:

 
                        Rate for_with_push splice_it push_with_clear w
+hile_with_shift_push
for_with_push         15.8/s            --      -47%            -50%  
+                -77%
splice_it             29.6/s           88%        --             -5%  
+                -56%
push_with_clear       31.3/s           98%        6%              --  
+                -54%
while_with_shift_push 67.7/s          330%      129%            117%  
+                  --
[download]

Completely different once the size of the array has ballooned. Interesting.

OK, well I guess I could see that building a large list of constants rather than non-constants (such as the first test) really isn't a good comparison. Therefore, to try and emulate the first test, I will 're-test' with a list of constants (three integers). Here are the results from that:

for_with_push         131415/s            --                  -14%    
+        -43%      -45%
while_with_shift_push 152026/s           16%                    --    
+        -34%      -37%
push_with_clear       231941/s           76%                   53%    
+          --       -4%
splice_it             240402/s           83%                   58%    
+          4%        --
[download]

Similar to the small list of strings, not totally surprising.

The next (and final) step was to try this a little closer to reality and have the @arr array be an array of hashes rather than array of strings and/or integers. Here are the results of that:

                          Rate for_with_push while_with_shift_push spl
+ice_it push_with_clear
for_with_push          73074/s            --                   -1%    
+  -45%            -46%
while_with_shift_push  73539/s            1%                    --    
+  -45%            -46%
splice_it             133132/s           82%                   81%    
+    --             -2%
push_with_clear       136236/s           86%                   85%    
+    2%              --
[download]

Again, completely different from before.

The point:
I tested four ways of doing the same thing, yet in all cases the scope of my problem was just a little different, yet all had completely different results.

  1. splice seems to be best for small lists of both constants and non-constants.
  2. Make the same list a little bigger and the results are completely different.
  3. Change the list to different data-types and the results are, again, completely different.

The bigger point:
Essentially, what I am trying to maintain here, is always be aware of the scope of the problem you are dealing with. All four examples would work in all three (four-ish) of my tests, though there may have been one that was 'better' than the other. I constantly find myself skimming over the smaller details of the problem at hand, whereas if I took a step back, would allow for a better application to be built.

Have fun with that one, and I'll stop my rambling now.

print map{chr}(45,45,104,97,124,124,116,97,45,45);
... and I posted this while I was at work => whitepages.com | INC

Comment on What came (in) first, the push or the splice? Select or Download Code

Replies are listed 'Best First'.
Re: What came (in) first, the push or the splice? by ikegami (Patriarch) on May 08, 2006 at 16:35 UTC
A couple of notes: `splice_it` makes an entire copy of the array on the stack. It could have been hitting the limits of your system's memory in the test with the array of 100000 items. `while( $_ = shift( @cp ) );` is not equivalent to the others. It will end prematurely if a numeric zero (`0`), a string zero (`'0'`), a nul string (`''`) or an undefined value is encountered in `@cp`. Perhaps you should have used: `sub while_with_shift_push { my $ref; my @cp = @arr; push( @$ref, shift( @cp ) ) while @cp; return $ref; }` [download] Probably irrelevant, but `for ( 0..$#cp )` used more memory and was probably slower than `for (my $i=@cp; $i--; )` in older versions of Perl. Unless you plan on working with very long lists, I'd use `splice_it`. It's the easiest to read of the solutions with similar performance. (And this "unless" prooves your point.)	[reply] [d/l] [select]
Re^2: What came (in) first, the push or the splice? by wazzuteke (Hermit) on May 08, 2006 at 16:43 UTC
All good points and I'm glad you brought them up. I guess the broader point I was trying to make was to make sure you dig your mind into the scope of the real problem. There are a number of things that will change your problem, inlcuding the reliability of (in this case ) `@cp` (in my case it will very reliably be either an empty list or an AoH), the Perl version you're running, etc. Aggreed that `splice_it` is the best way for my case as well. I guess I was trying to express less of the best solution for my problem, but rather the worthyness of investigating multiple solutions for all problems in a more general sence. `print map{chr}(45,45,104,97,124,124,116,97,45,45);` ... and I posted this while I was at work => whitepages.com \| INC	[reply] [d/l] [select]
Re: What came (in) first, the push or the splice? by dragonchild (Archbishop) on May 08, 2006 at 17:12 UTC
What about the option where the source AoH is lexically scoped and is reaped when it goes out of scope? Also, for clearing it, what about `$#cp = -1;`? On some Perls, particularly older ones, that's the fastest way to make the contents of an array unaccessible. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply] [d/l]
Re: What came (in) first, the push or the splice? by spiritway (Vicar) on May 10, 2006 at 16:46 UTC
It's been pointed out here before, but in general it's more effective to develop an efficient algorithm, than to try to squeeze cycles out by trying different functions or operators. If you're really pressed for time, it would be better to switch to a faster language, like C. IMNSHO.	[reply]