comment on

So the "is_contained()" call is costing a lot -- and it's doing some unnecessary work:

sub is_contained ($$$) {
    my ( $some_legal_range, $gene_legal_range, $legal_length ) = @_;

    ### save result for later use:
    my $range_length = legal_range_length($some_legal_range)
        or return 0;

#    if ( legal_range_length($some_legal_range) == 0 ) {
#        return 0;
#    }

    my $intersection_legal_range =
      intersect_legal_ranges( $some_legal_range, $gene_legal_range,
        $legal_length );

    return ( legal_range_length($intersection_legal_range) == $range_l
+ength );

#    if ( legal_range_length($intersection_legal_range) ==
#        legal_range_length($some_legal_range) )  ### <- already did t
+his call
#    {
#        return 1;
#    }
#    return 0;
}

# ...

sub intersect_legal_ranges($$$) {
    my ( $legal_range_1, $legal_range_2, $legal_length ) = @_;

    my $intersections_a = [];

    my $intersect_h;  ### let's declare this in advance

    for ( my $i = 0 ; $i < scalar( @{$legal_range_1} ) ; $i++ ) {
        for ( my $j = 0 ; $j < scalar( @{$legal_range_2} ) ; $j++ ) {
            $intersect_h = intersect_simple_ranges( ${$legal_range_1}[
+$i],
                ${$legal_range_2}[$j] ) and     ### only push what you
+'ll use
                push @{$intersections_a}, $intersect_h;
        }
    }
    $intersections_a = flatten_simple_ranges( $intersections_a, $legal
+_length );
}

sub flatten_simple_ranges($$) {
    my ( $simple_range_a, $legal_length ) = @_;

    my $legal_range = [];
    my $last_to     = undef;

#    # filter out undef
#    $simple_range_a = [ grep { $_ } @{$simple_range_a} ];  ### <- not
+ needed
    ...
[download]

When I ran the original code on my macbook, reported a total loop time of 95 sec.

After making the few changes mentioned above (remove a redundant function call, don't push undef onto an array, remove the grep that filters out the undef items that aren't being pushed now), the total loop time dropped to 80 sec.

If I also move the substance of the "intersect_simple_ranges()" function into the one block where that function is called (i.e. eliminate that one function call), it drops to 78 sec. Changing the {FROM, TO} AoH to a (less legible) AoA trimmed it to 75.

Beyond that, I don't see anything obvious, but I haven't taken the time to grok the overall algorithm (let alone comprehend the ultimate goal). There might be easier ways to accomplish whatever you're doing here, e.g. by perhaps using a simpler data structure.

UPDATE: Something to consider would be whether there's a way to make sure the required sorting is handled somewhere other than the inner-most loop, if at all possible. That is, can the algorithm be made to work in such a way that the data is sorted once before going into a given loop, thereby making it unnecessary to do repeated sorts within the loop?

In reply to Re^3: Some code optimization by graff
in thread Some code optimization by roibrodo

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.