The other day, a co-worker (
ysth) and I were looking at a bit of code. The results of our research turned out to be somewhat interesting and thought I'd share with y'all in case you happen to be interested.
The findings bring up a good programming point (particularly in Perl). Simply, it is very easy to program similar problems the same without looking at the direct context of what you are trying to do. In the rest of this posting, you will see that copying and clearing an array can be done a number of ways, whereas if you are looking for speed (which most of us are), depending on the size and contents of the array, the answer will be different. A summary of my findings are at the very bottom of the post if you don't want to read through all the rest of my words :)
Problem:
Basically, we were looking at a bit where we are trying to transfer one AoH to another (where the source was acting as a buffer). Looking through the history of the file, we noticed that the algorithem has changed over time. I said it was for 'speed' reasons mostly, whereas he recommended a 'faster' way. Not knowing which was truely faster, we decided to run a little benchmarking and here's what we came up with.
Again, the problem was to copy an AoH of
n size (where
n can range from 0 to a lot more than 0) while cleaning out the source AoH to ensure the buffer didn't sustain any old values. Simple enough. We decided to benchmark four different methods: a
for with a </code>push</code>, a simple
splice, a
push with a re-assignment of an empty list; and a
while with a
shift.
Here's what we started with (Where we ran Benchmark::cmpthese() for three seconds over each of the following subs):
my @arr = qw( one two three );
sub while_with_shift_push {
my $ref;
my @cp = @arr;
push( @$ref, $_ ) while( $_ = shift( @cp ) );
return $ref;
}
sub for_with_push {
my $ref;
my @cp = @arr;
push( @$ref, shift( @cp ) ) for ( 0..$#cp );
return $ref;
}
sub splice_it {
my $ref;
my @cp = @arr;
push( @$ref, splice( @cp ) );
return $ref;
}
sub push_with_clear {
my @cp = @arr;
my $ref;
push( @$ref, @cp );
@cp = ();
return $ref;
}
You can see we're basically testing four methods of doing the exact same thing to see what's faster.
As originally anticipated,
splice() won in true form. Followed closely by
push(), then
while and
for.
Rate for_with_push while_with_shift_push pus
+h_with_clear splice_it
for_with_push 160238/s -- -9%
+ -32% -43%
while_with_shift_push 175370/s 9% --
+ -26% -37%
push_with_clear 237079/s 48% 35%
+ -- -15%
splice_it 280139/s 75% 60%
+ 18% --
Of course, as I noted above, it's not really known what the size of the list will be. Therefore, for the next test, I increased the size of the list we were going to mull over to an array of integers 100000 deep. Not reality (yet), though will give a good example of what I am trying to express in this post. Using the same benchmark, here are the findings:
Rate for_with_push splice_it push_with_clear w
+hile_with_shift_push
for_with_push 15.8/s -- -47% -50%
+ -77%
splice_it 29.6/s 88% -- -5%
+ -56%
push_with_clear 31.3/s 98% 6% --
+ -54%
while_with_shift_push 67.7/s 330% 129% 117%
+ --
Completely different once the size of the array has ballooned. Interesting.
OK, well I guess I could see that building a large list of constants rather than non-constants (such as the first test) really isn't a good comparison. Therefore, to try and emulate the first test, I will 're-test' with a list of constants (three integers). Here are the results from that:
for_with_push 131415/s -- -14%
+ -43% -45%
while_with_shift_push 152026/s 16% --
+ -34% -37%
push_with_clear 231941/s 76% 53%
+ -- -4%
splice_it 240402/s 83% 58%
+ 4% --
Similar to the small list of strings, not totally surprising.
The next (and final) step was to try this a little closer to reality and have the
@arr array be an array of hashes rather than array of strings and/or integers. Here are the results of that:
Rate for_with_push while_with_shift_push spl
+ice_it push_with_clear
for_with_push 73074/s -- -1%
+ -45% -46%
while_with_shift_push 73539/s 1% --
+ -45% -46%
splice_it 133132/s 82% 81%
+ -- -2%
push_with_clear 136236/s 86% 85%
+ 2% --
Again, completely different from before.
The point:
I tested four ways of doing the same thing, yet in all cases the scope of my problem was just a little different, yet all had completely different results.
1.
splice seems to be best for small lists of both constants and non-constants.
2. Make the same list a little bigger and the results are completely different.
3. Change the list to different data-types and the results are, again, completely different.
The bigger point:
Essentially, what I am trying to maintain here, is always be aware of the scope of the problem you are dealing with. All four examples would work in all three (four-ish) of my tests, though there may have been one that was 'better' than the other. I constantly find myself skimming over the smaller details of the problem at hand, whereas if I took a step back, would allow for a better application to be built.
Have fun with that one, and I'll stop my rambling now.
print map{chr}(45,45,104,97,124,124,116,97,45,45);
... and I posted this while I was at work => whitepages.com | INC