Re^3: Optimize a perl block

Hi kcott, thanks for your answer :).

I also tried inlining the results (now discarded, but it was something like: "sub hrkeys () { keys %$hash_ref }"): that proved to be slower than using "@keys".

The () prototype makes a function a candidate for inlining, but only if the function doesn't have side effects and returns a constant value. keys can both have side effects (when working on a tied hash), and isn't guaranteed to return a constant value so your function couldn't be inlined.

I didn't think of caching. That's a good idea, and might put "sum(0) map" back in the picture; however, as you stated, that will depend on the OP's data (which hasn't been shown).

Well I used your code to try, and the runtime is divided by ten. But that's because all the keys in the outer hash are identical, so the the sum is only computed once, and the cache is used everytime after that. I doubt the OP's data is that redundant :). Here's my code BTW:

sub eilys_code {
    my $bar;
    my %cache;
    my @keys = keys %$hash_ref;

    for my $outer (@$array_ref) {
        my $sum;
        for my $inner (split /-/, $outer) {
          $sum = $cache{$inner} //= sum map { $_->{value} } @{ $foo->{
+$inner} }{@keys};
        }
        push @{$bar->{$sum}}, $outer;
    }

    print '*** kens_code ***';
    print "@{[ $_, $#{$bar->{$_}}, $bar->{$_}[0] ]}" for keys %$bar;
}
[download]

The fact that with this sample data all the sums are identical makes this test less representative though, because push @{$bar->{$sum}}, $outer; autovivifies an array only once.

I'm not completely averse to single-letter variable names, such as $i for a loop index

I think $x, $y and $z as coordinates are fine. I still try to avoid $i though, because most of the time you don't need a loop index in perl, so if one is needed, there's probably a meaningful name to explain why.

Comment on Re^3: Optimize a perl block Select or Download Code

Replies are listed 'Best First'.
Re^4: Optimize a perl block by kcott (Archbishop) on Sep 07, 2017 at 23:26 UTC
"... keys ... isn't guaranteed to return a constant value so your function couldn't be inlined." I disagree with that in part. I did take into consideration this from the keys doco: "Hash entries are returned in an apparently random order." Along with this, a couple of sentences later: "So long as a given hash is unmodified you may rely on keys, values and each to repeatedly return the same order as each other. " A very quick and dirty test (which I ran a number of times) with `$ perl -e 'my %x = qw{@ 1 \| 2}; for (1..30) { print keys %x }; print " +\n"'` [download] always returned one of these two: `\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@ @\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|@\|` [download] So, that's a reasonable confirmation of the doco, and the fact that my Perl version (`5.26.0`, by the way) is adhering to this. Anyway, I went back to the code I posted earlier, and added the subroutine requesting inlining back in; however, this time I wrote it like this, so that `@keys` was constant and couldn't possibly be altered elsewhere due to the lexical scoping of the anonymous block: `{ my @keys = keys %$hash_ref; sub hrkeys () { @keys } }` [download] There was no improvement on the OP's code. I ran it a few times, the "`kens/op:`" value was now showing 100% (±2%). With regard to the caching, that's the reason I didn't spend any time trying to check it. The minimal sanity checking (that the OP's and my code were producing equivalent results) showed `%$bar` had one key (`"1275"`) and its arrayref value held all of the two million elements. — Ken	[reply] [d/l] [select]
Re^5: Optimize a perl block by Eily (Monsignor) on Sep 08, 2017 at 08:47 UTC
I was thinking about the fact that the the output of keys can change when the input changes. As in `sub hrkeys () { keys %$hash_ref }`, where you just have to change the content of %$hash_ref to change the output. But my wording wasn't clear.	[reply] [d/l]