ViceRaid has asked for the wisdom of the Perl Monks concerning the following question:

Afternoon

I'm looking for something that works like a hash, but whose keys are ranges of integers. When I want a value from a hash, I give it a single integer, and it should return the value whose range key includes the given integer. Here's an implementation:

package RangedHash; sub TIEHASH { bless({}, shift); } # the key should be the start point of the range sub STORE { $_[0]->{$_[1]} = $_[2]; } sub FETCH { my ($self, $point) = @_; my $cursor = 0; foreach ( sort keys %$self ) { return $self->{$cursor} if ( $point >= $cursor && $point < $_ +); $cursor = $_; } return $self->{$cursor} } package main; my %ranged_hash; tie(%ranged_hash, "RangedHash"); $ranged_hash{0} = 'foo'; $ranged_hash{10} = 'bar'; $ranged_hash{20} = 'qux'; $range_hash{3}; # 'foo' $range_hash{14}; # 'bar' $range_hash{27}; # 'qux'

However, for high values, it has to look through every other key first, which rather defeats the point of hashes. In the real application, I'm working with these ranged hashes with hundreds or thousands of pairs. I'm wondering if there's a better algorithm, or prior art here? It doesn't necessarily have to work with tie, btw.

(If it's of interest, I'm working with long strings in a GUI that are composed of fragments of other strings. Some of the time, I need to be able to work out for a given point in the long composite string where the original fragment came from.)

Update: I should have stated that the ranges are:

TIA,
alex

Replies are listed 'Best First'.
Re: Hash keyed on ranges
by ColonelPanic (Friar) on May 07, 2004 at 13:04 UTC
    If the ranges tend to be short, this would be pretty simple and efficient:
    --$num_to_look_for until(exists $hash{$num_to_look_for});


    When's the last time you used duct tape on a duct? --Larry Wall

      ++ Thanks, I really like this suggestion. Much of the time the ranges tend to be short - perhaps around 50 to 100 wide. However, sometimes they might up to several thousand wide, as they might be representing multiple pages-worth of english text. I'll give it a whirl on some real examples.

Re: Hash keyed on ranges
by BrowserUk (Patriarch) on May 07, 2004 at 13:14 UTC

    There is not really enough information here yet.

    How would the extent each of the ranges be established?

    Do the ranges overlap?

    Are the ranges contiguous?

    Can you reasonably guestimate the minimum and maximium integer values?

    There are various ways of dealing with ranges, some more complicated than others (I remember Abigail-II recommending red-black trees to a similar question once, but can't locate it now), but which would be the simplest or fastest or otherwise 'best' depends very much on the details of your requirements.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
      How would the extent each of the ranges be established?

      They come from the source fragments of the composite string. Say the composite string is made up of three fragments "FOO" "QUUX" and "BAR". Each of these fragments is a string-based object, with extra info attached to it that I need to access. My ranged hash (or maybe array - thanks dragonchild) would then look like:

      # ... stands for the fragment's metadata { 0 => Fragment->new('FOO', ...), 3 => Fragment->new('QUUX', ...), 7 => Fragment->new('BAR', ...) } # and the composite string looks like "FOOQUUXBAR";

      I want to be able to get back to the source fragment object (and its associated data) quickly from a given offset in the composite string.

      Do the ranges overlap?

      Nope

      Are the ranges contiguous?

      Yup

      Can you reasonably guestimate the minimum and maximium integer values?

      Not easily, no - the fragments might be quite short or several thousand characters long, and the total number of fragments is also very variable.

      Thanks/

        Assuming "a few thousand" is less than say low tens of thousands, then using an array, and storing the (same)object in each element of the range would certainly be the fastest method. The limitation being the memory consumption if your array is likely to get very large.

        { my array; sub set{ my( $frag, $start, @meta ) = @_; @array[ $start .. ( $start + length( $freg ) -1 ) ] = ( Fragment->new( $frag, @meta ) ) x length( $frag ); return $array[ $start ]; } sub find{ my( $offset ) = @_; return $array[ $offset ]; } } ... set( 'FOO', 0, ... ); set( 'QUUX', 3, ... ); set( 'BAR', 7, ... ); ... my $found = find( 5 ); print $found->value; # prints 'QUUX';

        If that consumes too much space, then you could use a scalar to represent the array (in less space) and an array to store the objects.

        { my $pseudo; my @frags; sub set{ my( $frag, $offset, @meta ) = @_; push @frags, Fragment->new( $frag, @meta ); $pseudo .= pack( 'v', $#frags ) x length( $frag ); return $frags[ -1 ]; } sub find{ my $offset = @_; my $index = unpack( 'v', substr( $pseudo, $offset, 2 ) ); return $frags[ $index ]; } }

        With this, you push the objects onto an array once for each fragment, and then store the index at which you pushed it, converted to it's binary representation, once for each position it represents.

        This way is also very fast but reduces the memory consumption. A fragment that contains 100 characters requires about 200 + 24 bytes of storage, rather than 100 * 24 bytes. The limitation is that using a 16-bit integer to hold the indexes limits you to 64K fragments. You could move to 32-bits whch would probably cover mosts needs, with the obvious increase in memory usage.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
Re: Hash keyed on ranges
by Limbic~Region (Chancellor) on May 07, 2004 at 13:56 UTC
    ViceRaid,
    Just to annotate the two pieces of advice I gave you in the CB.
    • Always check to see if number entered is an exact match before searching
    • Use a parallel array to allow a modified binary search
    The second one requires the number to be entered with increasing values. I originally did not post because you did not state that as a given. Our conversation in the CB lead to that as a possibility. You do need to consider what to do if the entered number is lowerer than the lowest possible range.

    Cheers - L~R

Re: Hash keyed on ranges
by dragonchild (Archbishop) on May 07, 2004 at 13:09 UTC
    Why are you using a hash? Why don't you use an array for this? If you really feel guilty about space, you can use an array whose values for the given range are the key into the hash.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

      If space is really too much of an issue to have an array of every value in the range, but you need more speed than a simplistic solution like my previous post will give you, than you could store the sorted list of keys from your hash and binary search it. This is probably the best compromise if you have major speed and memory concerns.

      Of course, this all depends on what the data is like, and how it is used. A simpler solution will almost certainly suffice, unless you are accessing megabytes of data thousands of times.


      When's the last time you used duct tape on a duct? --Larry Wall
Re: Hash keyed on ranges
by ysth (Canon) on May 07, 2004 at 16:13 UTC
    Sounds like you want an array of end-of-range values to binary search, and then a hash keyed on the end-of-range value found. (I suggest using the end-of-range value because your values are contiguous for 0..n; if the binary search fails you know you are higher than n without having to maintain/check it separately. Negative input should obviously also be checked for.)