belg4mit has asked for the wisdom of the Perl Monks concerning the following question:

I have a list of, possibly sparse, numbers. I wish to find a range of the numbers from the list between values. These bracketing values may not be in the list itself (the low end should use the next number and the high end should use the previous). I have a working algorythm, but am not sure if it's best. This sounds like it might be a standard problem, and consequently have a standard solution.
my(@F, @G); @F = ( 1, 2, 4, 8, 16, 32..64 ); #Generate a reverse map for(my $i=0; $i<scalar @F; $i++){ $G[$F[$i]] = $i; } sub range{ my($lo, $hi) = @_; $lo++ until defined($G[$lo]); $hi-- until defined($G[$hi]); return $lo .. $hi; } print join(',', range(25,35)), "\n"; print join(',', range(50,75)), "\n";

--
I'm not belgian but I play one on TV.

Replies are listed 'Best First'.
Re: mapping lists
by tall_man (Parson) on Jan 24, 2003 at 20:50 UTC
    Here's a quick example showing your problem can be done using a binary search, even though the "25" is not in the list.

    Note: my version returns only elements actually in the list. If you want all the integers in between, just return $F[$lo_pos]..$F[$hi_pos].

    Update: It now appears that the index positions are wanted instead. No problem, that's just $lo_pos..$hi_pos.

    use strict; use Search::Binary; my $lastIndex; sub reader { my ($handle, $val, $index) = @_; if (!defined $index) { $lastIndex++; } else { $lastIndex = $index; } return ($val <=> $handle->[$lastIndex], $lastIndex); } my @F = ( 1, 2, 4, 8, 16, 32..64 ); sub range{ my($lo, $hi) = @_; my $lo_pos = binary_search(0,$#F,$lo,\&reader,\@F); my $hi_pos = binary_search(0,$#F,$hi,\&reader,\@F); $hi_pos = $#F if ($hi_pos > $#F); return @F[$lo_pos..$hi_pos]; } print join(',', range(25,35)), "\n"; print join(',', range(50,75)), "\n";
      I'd like to make a minor tweak to my solution. There's a sixth parameter "size" for binary_search, which is the point at which it reverts to a linear search, and it's defaulted to 512. That means the small @F array will be done with a linear search, after all.

      I set the size down and did one more optimization, as follows:

      my $lo_pos = binary_search(0,$#F,$lo,\&reader,\@F,2); my $hi_pos = binary_search($lo_pos,$#F,$hi,\&reader,\@F,2);
      I added a counter to the reader function and got the following output:
      Did 6 probes for low Did 6 probes for high 32,33,34,35 Did 6 probes for low Did 7 probes for high 50,51,52,53,54,55,56,57,58,59,60,61,62,63,64
      This is compared to a total of 78 probes the original way.
Re: mapping lists
by thinker (Parson) on Jan 24, 2003 at 19:38 UTC
    Hi Belg4mit,

    Here is my attempt
    #!/usr/bin/perl -w my(@F, @G); @F = ( 1, 2, 4, 8, 16, 32..64, 17 ); sub range{ my($lo, $hi) = @_; @G = sort grep {$_ >= $lo and $_ <= $hi} @F; return $G[0]..$G[$#G]; } print join(',', range(25,35)), "\n"; print join(',', range(50,75)), "\n"; print join(',', range(12,35)), "\n";

    this produces
    32,33,34,35 50,51,52,53,54,55,56,57,58,59,60,61,62,63,64 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35

    hope this helps
    thinker
    Update Changed < to <=
    thanks belg4mit
    Update2 Read the question properly, and return a range :-)
Re: mapping lists
by blokhead (Monsignor) on Jan 24, 2003 at 19:45 UTC
    If the values in @F are very sparse, you should probably use a hash to make the reverse map instead of the array @G. If you had a case where @F = qw/1 3 5 1000000 2000000/;, you wouldn't want to reverse map that into an array, which would become 2 million elements large. Apart from that, it's just plain easier to write a reverse map into a hash: @G{@F} = ();

    It's not clear to me from your description, but is the thing you really want the set intersection of the set @F with 25..35 and with 50..75? If this is the case, maybe one of the Set:: or Array:: modules might be worth inspecting (Set::IntRange?). Although I'm not familiar with what's out there, I would assume there would be some modules out there highly optimized for the intersection operation.

    If you're not looking for a set intersection, I don't see any obvious optimizations here, other than the possible sparseness of the reverse map that I mentioned.

    Update: I noticed your code always returns a range, even if some of the intermediaries in the range aren't in @F. thinker gives a very nice algorithm for set intersection with a range, but if you need to always return a range (even including elements not in @F), here's an adaptation:

    sub range{ my($lo, $hi) = @_; ($lo, $hi) = (sort grep {$_ >= $lo and $_ <= $hi} @F)[0,-1]; return $lo..$hi; }
    Depending on the data, this might be better than your original code. While this has to process your entire set of values, if the values are extremely sparse, you may end up ++'ing $lo and --'ing $hi all day in your algorithm.

    blokhead

      I thought of using a hash, but how to handle a range wherein the limits are not in the actual set was not obvious to me. I guess I could use the same thing but use exists en lieu of defined. But then again aren't hashes notoriously suboptimal for numbers? (strings of a limited set of characters)

      Not interesections, trying to find the existing elements within the defined range.

      --
      I'm not belgian but I play one on TV.

Re: mapping lists
by tall_man (Parson) on Jan 24, 2003 at 19:52 UTC
    If your lists are large, linear-search algorithms will be very slow. Your program and the one by thinker use linear searches. You can find points in a sorted list much faster with a binary search, which is O(log n). The Search::Binary module provides a generic implementation. (The learning curve for that module looks a bit steep, however.)
      And how am I supposed to do a binary search for somethign that doesn't exist in the tree?

      --
      I'm not belgian but I play one on TV.

        It's not a problem for a binary search that the value does not exist in the tree. It will find the nearest place for it.

        Here is an excerpt from the binary search documentation:

        use Search::Binary; $pos = binary_search($min, $max, $val, $read, $handle, [$size]);

        "binary_search" implements a generic binary search algo­rithm returning the position of the first record whose index value is greater than or equal to $val.

Re: mapping lists
by BrowserUk (Patriarch) on Jan 24, 2003 at 23:32 UTC

    Another binary chop solution

    #! perl -slw use vars qw[$MAX $SIZE $LO $HI]; use strict; $MAX ||= 100; $SIZE ||= 10; $LO ||= 25; $HI ||= 50; sub bsearch { my ($ref, $val) = @_; my ($lo, $hi) = (0, $#$ref); while ($lo < $hi) { my $i = ($lo+$hi)>>1; my $bool = $val <=> $ref->[$i]; if ($bool < 0) { $hi = $i; } elsif ($bool > 0) { $lo = $i + 1; } else { return $i; } } return $hi; } sub range { my ($ref, $lo, $hi) = @_; my $lidx = bsearch($ref, $lo); $lo = ($ref->[$lidx] > $lo and $lidx > 0) ? $ref->[--$lidx] : $ +ref->[$lidx]; my $hidx = bsearch($ref, $hi); $hi = ($ref->[$hidx] < $hi and $hi < $#$ref) ? $ref->[$hidx++] : $ +ref->[$hidx]; $lo .. $hi; } my @F; my ($start, $end) = (0,0); #! Gen some test data while ($start + $SIZE < $MAX) { $start += int rand($SIZE); $end = int $start + rand($SIZE); push @F, $start .. $end; $start = $end; } #! display it print "@F"; #! and a range from it encompassing two (possibly absent values) print "@{[ range(\@F, $LO, $HI) ]}"; __DATA__ C:\test>229675 -MAX=50 -SIZE=5 -LO=15 -HI=35 2 3 4 5 5 6 7 8 9 12 13 17 18 19 20 21 21 22 23 24 25 29 30 31 33 34 3 +6 37 40 41 42 43 44 45 46 47 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 3 +6

    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Re: mapping lists
by jdporter (Paladin) on Jan 24, 2003 at 21:07 UTC
    From your description and sample solution, it is not at all clear what the function should return when the subtended set is not a range of contiguous integers. I.e., given your value of @F, what should range(1,8) return? (1,2,4,8) or (1,2,3,4,5,6,7,8) ?

    Also, what guarantees will there be, regarding the orderedness and uniqueness of the elements of @F ?

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.

      I thought it was, given the examples. However, please read the soon to be posted update to see the real problem.

      Uniqueness is guaranteed. The list is probably sorted.

      --
      I'm not belgian but I play one on TV.

Re: mapping lists
by l2kashe (Deacon) on Jan 24, 2003 at 19:54 UTC
    Dont know if its faster, cause I didnt benchmark, but I like hashes a little better for this type of work, and for me its a little cleaner to write the same thing like so
    %data = map { $_ => 1 } @F; sub range{ my(@ret); for ($_[0] .. $_[1]) { push(@ret, $_) if ($data{$_}); } return @ret; }
    Which produces the same output as your code... Critique any one?
    Edit: I guess it would make more sense to initialize the data into a hash, as opposed to the array and then using map.
    Also thinking about it I would probably pass a third arg being the ref to the data hash, I wanted to work on.

    Edit_2: Doh that shoulda been defined in the loop :P.. o well.

    /* And the Creator, against his better judgement, wrote man.c */
UPDATE: mapping lists
by belg4mit (Prior) on Jan 24, 2003 at 22:12 UTC
    Doh! Sorry to have led so many on a wild goose chase (thanks for all the replies though). But I was so wrapped up in solving the problem of the range limits not being part of the set that I left out a HUGE set of the problem. So, consider that a warmup.

    Given the original @F(values are unique, and are probably sorted) return the indices of @F whose values fall within the supplied range.

    UPDATE: Although I suppose all that would be required then is to use one of these search techniques (which?) and reverse map the values of @F to the indices. Does this added step give one of the techniques an advatage over the others?

    --
    I'm not belgian but I play one on TV.

      The values are probably sorted? Oh-oh.

      If you aren't guaranteed they are sorted, you can't use a binary search. You'd have to do a linear pass to test if they are really sorted, and sorting them yourself isn't worth it unless you are going to do lots of range checks on the same data.

      Here is a simple linear method that requires no side data structures.

      use strict; my @F = ( 1, 2, 4, 8, 32..64, 26 ); sub range{ my($lo, $hi) = @_; return grep {$F[$_] >= $lo and $F[$_] <= $hi} 0..$#F; } print join(',', range(25,35)), "\n";
      This is basically what thinker and blokhead did except that my grep is on the indices instead of the values of @F.
      I was thinking along the lines of:
      #!/usr/bin/perl -w use strict; use Quantum::Superpositions; my (@F, @G, @range); @F = (1, 2, 4, 8, 16, 32..64); my ($i, $j); for($i=0; $i<scalar @F; $i++){ $G[$j++] = $i if (any(@range) == any($F[$i])); } print join(',', @G), "\n";