G'day Eily,
++ I tried a few things independently; however, it appears that much of that is very similar to what you've done, so I'll post it here for comparison.
"my @keys = keys %$hash_ref;" was one of my first thoughts and this appeared to be a definite winner. I also tried inlining the results (now discarded, but it was something like: "sub hrkeys () { keys %$hash_ref }"): that proved to be slower than using "@keys".
I had code very similar to your "sum map ..."; although, I used sum0. That appeared to be slower (even with the "map EXPR" form I used); I suspect any gains from sum were overshadowed by losses from map; I didn't investigate that any further.
I didn't think of caching. That's a good idea, and might put "sum(0) map" back in the picture; however, as you stated, that will depend on the OP's data (which hasn't been shown).
I dummied up some test data (based on the OP's description but, I'm sure, far from representative); ran some basic timings; and included some sanity checking. Here's the code I found to be fastest.
#!/usr/bin/env perl -l use strict; use warnings; use Time::HiRes 'time'; my $hash_ref; @$hash_ref{'a' .. 'j'} = 1 .. 10; my $array_ref = [ ('v-w-x-y-z') x 2e6 ]; my $foo; my $value = 0; for my $outer ('v' .. 'z') { $foo->{$outer}{$_}{value} = ++$value for 'a' .. 'j'; } my $t0 = time; op_code(); my $t1 = time; printf "op_code: %.6f\n", $t1 - $t0; kens_code(); my $t2 = time; printf "kens_code: %.6f\n", $t2 - $t1; print '*** Compare ***'; printf "kens/op: %.6f%%\n", (($t2 - $t1) / ($t1 - $t0)) * 100; sub op_code { my $bar; foreach my $a (@{ $array_ref }) { my $i = 0; foreach my $b (split('-', $a)) { foreach my $c (keys %{ $hash_ref }) { $i += $foo->{$b}->{$c}->{'value'}; } } push @{ $bar->{$i} }, $a; } print '*** op_code ***'; print "@{[ $_, $#{$bar->{$_}}, $bar->{$_}[0] ]}" for keys %$bar; } sub kens_code { my $bar; my @keys = keys %$hash_ref; for my $outer (@$array_ref) { my $sum; for my $inner (split /-/, $outer) { for (@keys) { $sum += $foo->{$inner}{$_}{value}; } } push @{$bar->{$sum}}, $outer; } print '*** kens_code ***'; print "@{[ $_, $#{$bar->{$_}}, $bar->{$_}[0] ]}" for keys %$bar; }
The array, with two million elements, took about 30s (so it's roughly comparable to what the OP describes). My code was typically shaving around 25-30% off of this. I ran it quite a few times — here's a fairly representative run.
*** op_code *** 1275 1999999 v-w-x-y-z op_code: 31.985272 *** kens_code *** 1275 1999999 v-w-x-y-z kens_code: 23.075756 *** Compare *** kens/op: 72.144941%
By the way, I totally agree with your comments re $a and $b: I only use those as special variables. I'm not completely averse to single-letter variable names, such as $i for a loop index; although, I do cringe when I find them liberally scattered through production code — meaningful names are a much better choice.
— Ken
In reply to Re^2: Optimize a perl block
by kcott
in thread Optimize a perl block
by IruP
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |