VincentK has asked for the wisdom of the Perl Monks concerning the following question:

Good day.

I am working on a problem that is just an intellectual challenge. I am not asking for help on solving the actual problem, rather just help on optimizing a loop. As it stands, running the script over the input takes a few hours to complete.

After the initial loading the array @input_connections, will contain pairs of numbers where each element is in a format like '7-12' (starting connection-ending connection). The code snippet below reads through each of these pairs and culls out the ones that are already implied by the other pairs

Whereas if the pair '7-12' was stored and the next pair is '7-10', '7-10' would be rejected because '7-10' falls within the range of '7-12'.

The array @used_connections stores numbers that have been used in each pair. The array @output_connections stores the connections that are to be output from the processing.

Ideally, I'd love to have the connections stored in a hash and be able to check for used numbers with something like ' if exists $hash{key1...keyN} ' thereby removing a loop for checking. I am seeking wisdom to see if there is ( and I am sure there is ) a more efficient way to go about this part.

Any suggestions will be most appreciated. Thanks!

I apologize if this does not make sense.

Code snippet

############################################################### my $pair_no = 1; foreach my $conn (@input_connections) { print "\rON pair $pair_no"; my @curr_conn = split(/\-/,$conn); ## check if current connection exists my $exists = 1; my $start = $curr_conn[0]; my $end = $curr_conn[1]; ## first number in pair is larger than second, reverse them if ($curr_conn[0] > $curr_conn[1]) { $start = $curr_conn[1]; $end = $curr_conn[0]; } ## ** Ideally I want to do away with this loop ** ## for ($start..$end) { if ( not defined $used_connections[$_] ) { $exists = 0; last; } } if ($exists == 0) { for ($start..$end) { $used_connections[$_] = 1; # I tried something like @used_connections[$start..$end] = + 1 , but # it did not seem to work the same as what is in place now +. } push @output_connections , $conn; } $pair_no++; } ###############################################################

Replies are listed 'Best First'.
Re: Request help on optimizing a loop for looking up numbers
by hdb (Monsignor) on Dec 11, 2013 at 16:14 UTC

    As a start have a look at this:

    use strict; use warnings; my %conns; while(<DATA>){ $conns{$1} = $2 if /(\d+)-(\d+)/ and $2 > ($conns{$1}//0); } printf "%d:\t%d\n", $_, $conns{$_} for sort keys %conns; __DATA__ 4-12 7-10 7-12 8-13

    Basically, for each start of interval you store the biggest end of interval. It is not clear to me whether you want to check for other, more comprehensive types of overlaps.

      Thank you hdb for your reply.

      Very clever! One liners never cease to amaze me. I do have more checks, but in reference to my reply to LaurentR, I wanted to see if there is a way to look up multiple hash keys in one step.

      Thank you

Re: Request help on optimizing a loop for looking up numbers
by Laurent_R (Canon) on Dec 11, 2013 at 16:01 UTC
    Hi, if I understood correctly your problem you could just create a hash whose keys are made of the concatenated start and end, with a appropriate separator, e.g. "7-12". The values can be anything, it could be 1. Then you just check for the existence of the hash for the concatenated pair of numbers.
    if (exists $hash{"start-$end"} ) { # do something }

      Thank you for your reply Laurent_R.

      That is close to what I want. Again, I probably did not do a good enough job explaining what I want. Sorry. The look up you have is close, but I want a check similar to yours that will check for the numbers between the range. Given '7-12', the code will check for 7,8,9,10,11, and 12.

      Is there something I can use that is similar to your check, but will essentially look for this ?

       if ( exists $hash{7} || exists $hash{8}... exists $hash{12} )

      Ideally

       if ( exists $hash{7..12} )

      Thanks.
        Hash slices might be what you need. This is an example of how they work demonstrated under the Perl debugger:
        DB<1> @hash{qw/one two three/} = qw /1 2 3/; DB<2> x \%hash 0 HASH(0x302ee308) 'one' => 1 'three' => 3 'two' => 2 DB<3>

        You can do something like

        if (grep exists $hash{$_}, 7..12) {

        but it's not necessarily any faster.

        A hash is an unordered data type, so there's no easy way to search for a range. (There are some "ordered hash" packages on cpan, but that's really a misleading name.)

Re: Request help on optimizing a loop for looking up numbers
by oiskuu (Hermit) on Dec 11, 2013 at 17:33 UTC
    This problem is solvable with an Interval tree. Google search points to Set::IntervalTree and Tree::Interval. I haven't used either and could not give a recommendation, though.

    Your code might look like this:

    my $r = $tree->fetch($x->lo, $x->hi); if (@{$r}) ... # overlap case: perform the necessary checks $tree->insert($x, $x->lo, $x->hi);

    Edit: If ranges can overlap in any manner and appear in any order, you'll need to figure out which are to be kept and which rejected. If you merge the overlapping ranges, then the interval tree ought to perform well.
Re: Request help on optimizing a loop for looking up numbers
by Laurent_R (Canon) on Dec 11, 2013 at 18:51 UTC
    The Set::IntSpan module might very well be what you are after.
Re: Request help on optimizing a loop for looking up numbers
by choroba (Cardinal) on Dec 11, 2013 at 18:55 UTC
Re: Request help on optimizing a loop for looking up numbers
by wazat (Monk) on Dec 11, 2013 at 17:02 UTC

    I'm not sure if I follow the problem you are trying to solve. But if I view it as something like finding the set of line segments that are not subsegments, then it might helpful to sort your connection array before doing your checks.

    I suspect that sorting by increasing start and decreasing end, the solution then reduces to checking if the current connection is a subconnection of the previous one. But I could be wrong.

      1-6, 2-3, 4-5
Re: Request help on optimizing a loop for looking up numbers
by Anonymous Monk on Dec 11, 2013 at 17:24 UTC

    Also,

    @used_connections[$start .. $end] = (1) x ($end - $start + 1)
Re: Request help on optimizing a loop for looking up numbers
by VincentK (Beadle) on Dec 11, 2013 at 20:50 UTC
    Thank you everyone for your comments! I really appreciate them. I will take a look at the modules that were suggested in the posts. Even if they do not apply to this particular problem, I always enjoy learning something new.

    Thanks again.