Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Performance issue in the loop.

by rsFalse (Chaplain)
on Feb 22, 2022 at 19:39 UTC ( [id://11141560] : perlquestion . print w/replies, xml ) Need Help??

rsFalse has asked for the wisdom of the Perl Monks concerning the following question:


Today I tried to optimize slowest part of my program. That is inside of nested loops:
#!/usr/bin/perl use warnings; use strict; $\ = $/; srand( 1 ); my $n = 2e3; @_ = map int rand 1e5, 1 .. $n; print "@_[ 0 .. 3 ] ..."; my @max; push @max, ( sort { $b <=> $a } @_ )[ 0 ]; my @sums = @_; for my $i ( 1 .. $n - 1 ){ my $max = -~0; for my $j ( $i .. $n - 1 ){ # HOT-SPOT { $sums[ $j - $i ] += $_[ $j ]; $sums[ $j - $i ] > $max and $max = $sums[ $j - $i ]; # HOT-SPOT } } push @max, $max; } print "Finished";
I changed the HOT-SPOT lines to:
( $sums[ $j - $i ] += $_[ $j ] ) > $max and $max = $sums[ $j - + $i ];
...and my code started to work ~30 percent faster. I think it is good increase in performance but a slight decrease in readability.

I executed these code dozens of times to see how fast do they work. I observed that both programs work in not constant time and about every 10th run program occasionally consume >1.5x time than usually. I observed it on perl v5.30 and v5.32, but not on v5.14. Any ideas why every ~10th run any of these programs runs much slower?

Replies are listed 'Best First'.
Re: Performance issue in the loop.
by NERDVANA (Deacon) on Feb 22, 2022 at 21:19 UTC
    You know what would really improve the readability? Comments :-)

    As for performance, it would probably be better to use List::Util's max() at the end of the inner loop instead of building the max while iterating. You pay for the array access and the subtraction each time you write $sums[$j-$i] You also pay each time you open a scope with curly braces. So maybe:

    use List::Util 'max'; ... $sums[ $_ - $i ] += $_[ $_ ] for $i .. $n-1; push @max, max(@sums[0 .. $n-1-$i]);

    I don't know why the run-time would vary every 10th run for you, but if it's in a terminal and you are printing to the screen, your desktop environment could have an effect maybe. I ran this example a bunch of times and they came out to pretty much the same ms each time. But, this example as written only takes 175ms for me which is too tiny to really use as a benchmark. I assume you run with a larger $n?

    Also I don't recommend using @_ when you aren't dealing with function arguments. Maybe call it @input or something?

      Thanks for suggestions.

      I ran your code and it worked a little bit slower than my program (optimized variant).
      Also I observed that by few-fold increasing $n, the factor of ~1.5x isn't changing.
      P.S. I used v5.30.
        Ah, nevermind, I realize I was comparing to your old code, not your optimized one. Yes the optimized one gets rid of the double check of $sums$j-$i so that works too.

        Still, List::Util::max is faster than (sort...)[0]

Re: Performance issue in the loop.
by Fletch (Bishop) on Feb 22, 2022 at 20:15 UTC

    Nothing in your code jumps out but whenever you get weird, inexplicable delays check that there's not something reliant on a network connection (locally or to the outside world). A not completely synthetic example; I've had something similar happen: if you have something trying to resolve a hostname (for whatever reason) and your DNS is intermittently unavailable or timing out you can get inconsistent runtimes from that. Also you want to make sure you're using your shell's builtin time command to get execution time of just your script. If you're doing something like date ; perl mytest.plx ; date and then manually computing the time there could be something (maybe your shell has a prompt command that does something network-y) that's making you see a delay but it's not in your code.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Ye, I used 'time' command: time perl
        How about you use time instead to time some subs?

        using a laptop? OS got CPU throttling feature? It can throw off benchmarks significantly. You can tell full throttle difference by fan noise.

Re: Performance issue in the loop.
by vr (Curate) on Feb 23, 2022 at 11:43 UTC

    If, based on your answers in this thread, ultimate goal is better speed after all (and not mysterious slow-downs on every 10th run, about which I have nothing to add), then as usual in case of large arrays, use PDL;. Second pair of results are for 10x $n.

    use warnings; use strict; use feature 'say'; use Time::HiRes 'time'; srand( 1 ); my $n = 2e3; my @input = map int rand 1e5, 1 .. $n; my ( $check1, $check2 ); { # original code my $t = time; my @max; push @max, ( sort { $b <=> $a } @input )[ 0 ]; my @sums = @input; for my $i ( 1 .. $n - 1 ){ my $max = -~0; for my $j ( $i .. $n - 1 ){ # HOT-SPOT { ( $sums[ $j - $i ] += $input[ $j ] ) > $max and $max = $sums[ $j - $i ]; # HOT-SPOT } } push @max, $max; } say time - $t; $check1 = pack 'I*', @max; } { # PDL my $t = time; use PDL; use PDL::NiceSlice; my $sums = zeroes long, scalar @input; my $vec = pdl long, \@input; my @max = map $sums( 0 : -$_ - 1 ) -> inplace -> plus( $vec( $_ : -1 ), 0 ) -> maximum , 0 .. $#input; say time - $t; $check2 = pack 'I*', @max; } use Test::More; is $check1, $check2; done_testing; __END__ 0.148595094680786 0.0387320518493652 ok 1 1..1 14.5055229663849 0.759373903274536 ok 1 1..1
      Moar PDL!

      Here's my tweak of vr's excellent code, for use in the perldl REPL, which doesn't feature the check, just does timing (and doesn't use strict because of the REPL), and uses "+=" instead of the wordier "->inplace->plus":

      $n = 2e4; @input = map int rand 1e5, 1 .. $n; $sums = zeroes long, scalar @input; $vec = pdl long, \@input; with_time { my @max = map +($sums( 0 : -$_ - 1 ) += $vec( $_ : -1 ))-> +maximum, 0 .. $#input };
      I also increased the $n.

      EDIT: in PDL 2.077 with_time is always available in the PDL shells so I removed the definition of it here.

Re: Performance issue in the loop.
by bliako (Monsignor) on Feb 23, 2022 at 09:36 UTC
    ...and my code started to work ~30 percent faster.

    my guess is that it is because you have eliminated 1 out of 4 array accesses. But then with all sorts of caching with modern CPUs and OS, it's quite spectacular that it works as in the textbook case (1/4=25%) and even better! Perhaps I miss something ...

    How about using a temp variable, declared outside of the scope, to cache $sums[ $j - $i ], like: ( $tmp = $sums[ $j - $i ] += $_[ $j ] ) > $max and $max = $tmp; Does it make a difference?

    Any ideas why every ~10th run any of these programs runs much slower?
    push @max, $max;

    this has to extend @max from time to time. And since all data is random, its size is unpredictable. But this can not explain the regularity of performance decrease (re: > 1.5x every 10th time). On the other hand, the interaction of two or more uniform distributions create normal-distribution effects (re: Central Limit Theorem).

    The least you can do is to print the size of @max at the end of each program run and see if there is some correlation with running times?

    If your algorithm allows it, you may want to switch the loops order. First $j and then $i so that $_[ $j ]; is cached in the outer loop and you eliminate a lot of array access.

    Regarding timing your program, I use this:

    use Time::HiRes qw/time/; my $start_time = Time::HiRes::time(); ... my $time_taken = Time::HiRes::time() - $start_time; # floating seconds

    bw, bliako

Re: Performance issue in the loop.
by 1nickt (Canon) on Feb 22, 2022 at 20:08 UTC

    "Any ideas why every ~10th run any of these programs runs much slower?"

    What else is the computer doing?

    The way forward always starts with a minimal test.
      Idk. Firefox, etc.
      So I tried to compare running times across different perl versions.
Re: Performance issue in the loop.
by Your Mother (Archbishop) on Feb 22, 2022 at 20:41 UTC

    I imagine, but have no evidence, this would speed things up a bit.

    my $max; for my $i ( 1 .. $n - 1 ) { $max = -~0;
      Thanks. I tried your suggested code and I didn't observe an increase in performance.
Re: Performance issue in the loop.
by harangzsolt33 (Chaplain) on Feb 24, 2022 at 13:45 UTC
    I have noticed that working with arrays is slow in Perl, but if there is any way you could rewrite your code so that instead of an array of numbers, you use a string (an array of characters), that would make it faster, I think. Use vec($STRING, $PTR, 8) = $NewValue to change values within the array OR $MyValue = vec($STRING, $PTR, 8) to read values from the string. The problem is that a character can only go from 0 to 255, so if you want a larger range, you would use 16-bit ints or 32-bit ints to store the numbers. If you use 32-bit ints, vec($STRING, $PTR, 32) will do the work fine. The string will be 4X longer, and keep in mind that when $PTR = 4, it points to the 5th 32-bit int, not to the 2nd 32-bit int.