While working on a complex script doing lookups and searches on a dozen arrays of hashes (each array representing a relational database table) I stumbled across an extremely simple improvement that instantly gave almost twice the performance.
The original loop looked like this:
sub filter { my $where = shift; my @in = @_; # This class method is used to filter an array of hashrefs against a + set of criteria defined in $where. # Example: # @matching_hosts = filter( { site => 56, type => 4 }, @all_hosts) +; # In this example, @matching_hosts will only contain those hashrefs +that would return TRUE for the following code: # ($_->{'site'} eq '56' && $_->{'type'} eq '4') # Note that the "eq" and "&&" are implied; no other operators are su +pported. # The order of the array is not affected. my @out = (); foreach my $record (@in) { my $keep = 1; foreach my $field (keys %{$where}) { unless ($record->{$field} eq $where->{$field}) { $keep = 0; last; } push @out, $record if $keep; } } return @out; }
The rewritten loop looks like this:
sub filter { my $where = shift; my @in = @_; # This class method is used to filter an array of hashrefs against a + set of criteria defined in $where. # Example: # @matching_hosts = filter( { site => 56, type => 4 }, @all_hosts) +; # In this example, @matching_hosts will only contain those hashrefs +that would return TRUE for the following code: # ($_->{'site'} eq '56' && $_->{'type'} eq '4') # Note that the "eq" and "&&" are implied; no other operators are su +pported. # The order of the array is not affected. my @out = (); # Make one pass per match term foreach my $field (keys %{$where}) { my $value = $where->{$field}; @out = grep { $_->{$field} eq $value } @in; @in = @out; # Prepare for next pass (if any) } return @out; }
The running times of actual reports dropped from over 4 seconds to less than 2 seconds. Some of that improvement obviously came from using the built-in grep{} function instead of manually checking each value and push()'ing hashrefs to the @out array, but I didn't expect that much of an improvement.
There had to be a different explanation, and that got me thinking about the cost of setting up and executing a foreach() loop:
$ cat foreach_inner #!/usr/bin/perl use strict; use warnings; foreach my $foo (1 .. 3) { foreach my $bar (1 .. 10000000) { my $pointless = "$foo.$bar"; } }
$ time ./foreach_inner real 0m8.975s user 0m8.954s sys 0m0.013s
$ cat foreach_outer #!/usr/bin/perl use strict; use warnings; foreach my $foo (1 .. 10000000) { foreach my $bar (1 .. 3) { my $pointless = "$foo.$bar"; } }
$ time ./foreach_outer real 0m14.106s user 0m14.092s sys 0m0.003s
Both test scripts do the exact same amount of (pointless) work, the difference between the two scripts is that 'foreach_inner' has to execute 9999997 more foreach() loops than 'foreach_outer'.
Sometimes, even a seemingly pointless improvement can make a significant difference if made in the right place.
Now, the way filters are specified in $where is pretty much nailed down because that hashref is built and used in a lot of different contexts. I am still looking for a way to express the whole thing as a single grep{} block to eliminate the looping altogether. Maybe tomorrow.
Time flies when you don't know what you're doing
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: On optimizing nested loops
by RichardK (Parson) on Oct 19, 2014 at 12:42 UTC | |
by FloydATC (Deacon) on Oct 20, 2014 at 06:58 UTC | |
by tobyink (Canon) on Oct 21, 2014 at 18:42 UTC | |
by FloydATC (Deacon) on Oct 22, 2014 at 20:21 UTC | |
|
Re: On optimizing nested loops
by Anonymous Monk on Oct 19, 2014 at 10:08 UTC | |
by FloydATC (Deacon) on Oct 20, 2014 at 06:54 UTC | |
|
Re: On optimizing nested loops
by Laurent_R (Canon) on Oct 25, 2014 at 15:20 UTC |