in reply to Some code optimization

Reserving the right to change my mind if and when you post actual code, the simplest way of optimising your loop would (probably) be to in-line the bodies of the two functions in the loop.

Function call overhead is relatively high in Perl. If the function itself does very little--as appears to be the case from your description--then the overhead of setting up and tearing down the "stack frame" can be higher than that for the code inside. By in-lining the code, you avoid that overhead, and for small functions, there is little downside. Especially if this is the only place those subs are called.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^2: Some code optimization
by almut (Canon) on Jun 17, 2010 at 16:34 UTC
    Function call overhead is relatively high in Perl.

    I could be wrong, but if I understand the OP correctly, the function needs in the order of several seconds for just 50 calls.  Perl's function call overhead may be high, but it shouldn't be that high...

      if I understand the OP correctly,

      Hence the reason for reserving the right to change my mind. I couldn't make head nor tails of this:

      I get 1 seconds per 50 iterations. Obviously better than the 28 seconds, but still significantly higher than the 9 seconds

      It also doesn't ring true that

      The function itself is quite simple - it gets 3 scalars (one of which is a field of some object), checks a couple of if's on the values and some basic math (including modulo %), and returns an array with a couple of hashes, each with two numerical fields. That's it.

      Would require 28 seconds/50 iterations.

      A tentative attempt to match the description yields :

      sub x{ my( $s1, $s2, $s3 ) = @_; $s1 *= 10 if $s1; $s2 **= 4 if $s2; $s3 *= $s2 % $s1; return [ { n1 => $s1, n2 => $s2 }, { n1 => 41 * $s2, n2 => ( $s1 * $s2 ) % $s3 } ]; };; $I = 50e3; $t = time; x( 123, 456, 789 ) for 1 .. $I; printf "$I calls took %.4f seconds\n", time()-$t;; 50000 calls took 0.3174 seconds $I = 50e5; $t = time; x( 123, 456, 789 ) for 1 .. $I; printf "$I calls took %.4f seconds\n", time()-$t;; 5000000 calls took 26.5590 seconds

      I appreciate my sub is guesswork and nothing like the real thing, but given the description, it is hard to guess what else that description is hiding that causes the sub to take a million times longer?

      Of course, it'll turn out that he's tallying the national debt. Or crowd sourcing the math to Twitter.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Some code optimization
by roibrodo (Sexton) on Jun 18, 2010 at 07:10 UTC
    use strict; use warnings; use List::Util qw(max min); use Time::HiRes qw(time); # this builds a structure that is usually retrieved from disk. # in this example we will use this structure again and again, # but in the real program we obviously retrieve a fresh structure # at each iteration my $simulation_h = {}; for ( 1 .. 70000 ) { my $random_start = int( rand(5235641) ); my $random_length = int( rand(40000) ); push @{ $simulation_h->{$random_start} }, $random_length; } my $zone_o = { _chromosome_length => 5235641, _legal_range => [ { FROM => 100000, TO => 200000 } ] }; my $start_time = time; scenario(); print "total loop time: " . ( time - $start_time ) . " seconds\n"; sub scenario { for ( my $i = 0 ; $i < 50 ; $i++ ) { print "i=$i time=" . ( time - $start_time ) . " seconds\n"; # originally there was a retreive of $simulation_h from disk h +ere # iterate genes foreach my $gene_from ( keys %{$simulation_h} ) { foreach my $gene_length ( @{ $simulation_h->{$gene_from} } + ) { # next; my $temp_gene_to_legal_range = gene_to_legal_range( $gene_from, $gene_length, $zone_o->{_chromosome_length} ); # next; # is_contained( $temp_gene_to_legal_range, $zone_o->{_legal_range}, $zone_o->{_chromosome_length} ); } } } } sub gene_to_legal_range($$$) { # return []; my ( $gene_from, $gene_length, $legal_length ) = @_; my $ret; my $gene_to = ( ( $gene_from + $gene_length - 1 ) % ($legal_length +) ) + 1; if ( $gene_to < $gene_from ) { # split # low range first $ret = [ { FROM => 0, TO => $gene_to }, { FROM => $gene_from, TO => $legal_length } ]; } else { # single $ret = [ { FROM => $gene_from, TO => $gene_to } ]; } return $ret; } sub is_contained ($$$) { my ( $some_legal_range, $gene_legal_range, $legal_length ) = @_; if ( legal_range_length($some_legal_range) == 0 ) { return 0; } my $intersection_legal_range = intersect_legal_ranges( $some_legal_range, $gene_legal_range, $legal_length ); if ( legal_range_length($intersection_legal_range) == legal_range_length($some_legal_range) ) { return 1; } return 0; } sub legal_range_length($) { my ($legal_range_a) = @_; my $length = 0; foreach my $simple_range_h ( @{$legal_range_a} ) { $length += ( $simple_range_h->{TO} - $simple_range_h->{FROM} ) +; } return $length; } sub intersect_legal_ranges($$$) { my ( $legal_range_1, $legal_range_2, $legal_length ) = @_; my $intersections_a = []; for ( my $i = 0 ; $i < scalar( @{$legal_range_1} ) ; $i++ ) { for ( my $j = 0 ; $j < scalar( @{$legal_range_2} ) ; $j++ ) { my $intersect_h = intersect_simple_ranges( ${$legal_range_ +1}[$i], ${$legal_range_2}[$j] ); push @{$intersections_a}, $intersect_h; } } $intersections_a = flatten_simple_ranges( $intersections_a, $legal +_length ); } # sub intersect_simple_ranges($$) { my ( $simple_range_1, $simple_range_2 ) = @_; my $from = max( $simple_range_1->{FROM}, $simple_range_2->{FROM} ) +; my $to = min( $simple_range_1->{TO}, $simple_range_2->{TO} ); if ( $from >= $to ) { # empty range return undef; } else { return { FROM => $from, TO => $to }; } } # get an array of simple ranges representing a single legal range # reurn a legal range (assuming each of the given ranges is simple and + legal) sub flatten_simple_ranges($$) { my ( $simple_range_a, $legal_length ) = @_; my $legal_range = []; my $last_to = undef; # filter out undef $simple_range_a = [ grep { $_ } @{$simple_range_a} ]; # sort by from $simple_range_a = [ sort { $a->{FROM} <=> $b->{FROM} } @{$simple_r +ange_a} ]; for ( my $i = 0 ; $i < scalar( @{$simple_range_a} ) ; $i++ ) { my $from = ${$simple_range_a}[$i]->{FROM}; my $to = ${$simple_range_a}[$i]->{TO}; # check if first range to process if ( !defined $last_to ) { push @{$legal_range}, { FROM => $from, TO => $to }; $last_to = $to; } elsif ( $from <= $last_to && $to > $last_to ) { # range overlap and longer, extend range my $last_h = pop @{$legal_range}; $last_h->{TO} = $to; push @{$legal_range}, $last_h; $last_to = $to; } elsif ( $from <= $last_to ) { # range overlap but contained, do nothing } else { # non overlap ($from > $last_to) # start new range push @{$legal_range}, { FROM => $from, TO => $to }; $last_to = $to; } } return $legal_range; }
    all commented parts commented (i.e. full loop): total loop time: 63.6951129436493 seconds
    with the second next uncommented: total loop time: 15.1547348499298 seconds with the second next uncommented and return []; in sub gene_to_legal_range also uncommented: total loop time: 6.83389496803284 seconds
    with the first next uncommented: total loop time: 4.58600687980652 seconds

      In your initial question you stated, "The function itself is quite simple - it gets 3 scalars (one of which is a field of some object), checks a couple of if's on the values and some basic math (including modulo %), and returns an array with a couple of hashes, each with two numerical fields. That's it." But then the code you presented here shows a quagmire of complexity of loops within loops, greps within loops (which is another form of loop), sorting within loops (which implies more loops), and so on. Having first read your initial question (with no code posted), and then later reading the code you posted, I couldn't believe my eyes. My first thought was, "This must be a practical joke. We're being suckered." What you're calling "checking a couple of ifs and some basic math" is actually nested loops with sorting and greping inside of them. It couldn't get much worse from an efficiency standpoint than that.

      Let's imagine a really simple case in this code fragment:

      my $n = 100; my $c = 0; foreach (0 .. $n) { $c++; } print "$c\n";

      That loop executes in O(n) time. Now consider this loop:

      my $n = 100; my $c = 0; foreach my $outter (0 .. $n) { foreach my $inner (0 .. $n) { $c++; } } print "$c\n";

      That chunk of code executes in O(n^2) time. That means that for 'n' there are n^2 iterations. If your code stopped there, you would still be questioning why it's taking so long. But it doesn't. I didn't wander through every level of looping, but I think your efficiency is going quadratic, which is to say, inefficient. I don't know how you could equate multiple layers of nested loops with "The function itself is quite simple..." These are mutually exclusive conditions.

      It could be there is a more efficient algorithm out there to solve the problem you're tackling. OR, it could be that you have one of those problems for which there is no efficient solution. If that's the case, thank goodness you've got a computer to do it for you. ;)


      Dave

        Let's look at a single iteration and define n as the number of genes. A gene is a start point + length, i.e. n is the number of elements pushed in the creation of $simulation_h (i.e. n=70,000 in the example).

        The main loop is indeed a nested loop, but it simply gets a single gene at a time from the hash, hence this double loop itself takes O(n) (before doing anything).

        Now, let's look at what is done inside the loop. We call two "problematic" subroutines, which in turn, call some other subroutines. The "loops" you see in the other subroutines are used to deal with the case when genes are split (start too close to the "end" of a circular the chromosome, thus have to be represented by two ranges instead of one). A range can therefore be either a singleton (one array element) or "split" (two array elements).

        So, yes, there are loops, but they sure don't add another factor of n to the complexity, since these loops are bounded by a small constant -- each loop is iterated for maximum of 2 iterations -- so even the double loop that checks a pair of ranges against each other (each range is "split" in the worse case scenario) is iterated for maximum of 4 iterations.

        To conclude, the complexity is actually O(n) per iteration (again, n is the number of genes).

      So the "is_contained()" call is costing a lot -- and it's doing some unnecessary work:
      sub is_contained ($$$) { my ( $some_legal_range, $gene_legal_range, $legal_length ) = @_; ### save result for later use: my $range_length = legal_range_length($some_legal_range) or return 0; # if ( legal_range_length($some_legal_range) == 0 ) { # return 0; # } my $intersection_legal_range = intersect_legal_ranges( $some_legal_range, $gene_legal_range, $legal_length ); return ( legal_range_length($intersection_legal_range) == $range_l +ength ); # if ( legal_range_length($intersection_legal_range) == # legal_range_length($some_legal_range) ) ### <- already did t +his call # { # return 1; # } # return 0; } # ... sub intersect_legal_ranges($$$) { my ( $legal_range_1, $legal_range_2, $legal_length ) = @_; my $intersections_a = []; my $intersect_h; ### let's declare this in advance for ( my $i = 0 ; $i < scalar( @{$legal_range_1} ) ; $i++ ) { for ( my $j = 0 ; $j < scalar( @{$legal_range_2} ) ; $j++ ) { $intersect_h = intersect_simple_ranges( ${$legal_range_1}[ +$i], ${$legal_range_2}[$j] ) and ### only push what you +'ll use push @{$intersections_a}, $intersect_h; } } $intersections_a = flatten_simple_ranges( $intersections_a, $legal +_length ); } sub flatten_simple_ranges($$) { my ( $simple_range_a, $legal_length ) = @_; my $legal_range = []; my $last_to = undef; # # filter out undef # $simple_range_a = [ grep { $_ } @{$simple_range_a} ]; ### <- not + needed ...
      When I ran the original code on my macbook, reported a total loop time of 95 sec.

      After making the few changes mentioned above (remove a redundant function call, don't push undef onto an array, remove the grep that filters out the undef items that aren't being pushed now), the total loop time dropped to 80 sec.

      If I also move the substance of the "intersect_simple_ranges()" function into the one block where that function is called (i.e. eliminate that one function call), it drops to 78 sec. Changing the {FROM, TO} AoH to a (less legible) AoA trimmed it to 75.

      Beyond that, I don't see anything obvious, but I haven't taken the time to grok the overall algorithm (let alone comprehend the ultimate goal). There might be easier ways to accomplish whatever you're doing here, e.g. by perhaps using a simpler data structure.

      UPDATE: Something to consider would be whether there's a way to make sure the required sorting is handled somewhere other than the inner-most loop, if at all possible. That is, can the algorithm be made to work in such a way that the data is sorted once before going into a given loop, thereby making it unnecessary to do repeated sorts within the loop?

        Thank you, I will look into it. But I think the problem starts with the first problematic function (as the name suggests :)) - gene_to_legal_range.

        When I next just before it I run 6 times faster than if I next just after it (~4 seconds compared to ~27 seconds for 50 iterations). This subroutine really looks simple to me, so I don't see why it has to take so much time (subroutine overhead?!).

        p.s. I now run my home PC and the benchmarking results are a little different (but the relative differences are similar).

        UPDATE

        Almost always the sort is done on an array with a single element (there are cases where there are more). I even tried removing the sort at all, just to see the effect on running time - there is no change.

        But, again, I suggest we start with the first, more simple subroutine. Is it really just perl subroutine overhead? Is there anything I can do about it?