Marcello has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have a hash with international phone numbers which looks like this:
%hash = (31123 => 1, 31456 => 2, 32123 => 3, 32456 => 4, 33 => 5, 3361 => 6, ... );
The first two digits are the country code, the other three are the prefix of the subscribers number. This hash can contain more than 100 of these prefixes, where the prefixes are of variable length. Now I have a single phone number e.g. 31456000001. I would like to do ONE lookup in the hash to find that value corresponding with this prefix (31456, which is 2). But the problem is that the prefixes have a random length as you can see in the hash, so I cannot use the first 5 digits of the phonenumber as the key. It would be nice to specify a *-sign to use as a wildcard in the hash, e.g.:
%hash = (31123 => 1, 31456 => 2, 32123 => 3, 32456 => 4, 33* => 5, 3361* => 6, ... );
where %hash->{33000000000} would give me 5. Is there a way to accomplish this with only ONE lookup instead of using a regexp on all keys in the array? TIA! Regards, Marcel

Replies are listed 'Best First'.
Re: Fast lookup for prefixes in hash
by mirod (Canon) on Oct 31, 2001 at 18:23 UTC

    I don't think you can get it in ONE lookup, but you don't have to go through all the keys either. For 31456 xxxx you just have to test 31, 314, 3145 and 31456:

    #!/bin/perl -w use strict; my %hash = (31123 => 1, 31456 => 2, 32123 => 3, 32456 => 4, 33 => 5, 3361 => 6, ); foreach my $number ( qw( 31456000001 33456000001)) { my @digits= split //, $number, 6; my( $prefix, $value); foreach my $prefix_length ( 1..4) { my $trying= join '', @digits[0..$prefix_length]; if( exists $hash{$trying}) { $value= $hash{$trying}; $prefix= $trying; } } if( defined $value) { print "$number => $value ($prefix)\n"; } else { warn "no value found for $number\n"; } }
Re: Fast lookup for prefixes in hash
by Zaxo (Archbishop) on Oct 31, 2001 at 21:18 UTC

    Here's a somewhat different way (no regex involved):

    my $number="31456000001"; my $len = (grep {exists $hash{substr($number,0,$_)}} 2..5)[-1]; my $result = $hash{substr($number,0,$len)};
    This finds the longest matching key in four iterations, then obtains the corresponding hash value. The exists function is used to avoid autovivifying false keys in %hash.

    After Compline,
    Zaxo

Re: Fast lookup for prefixes in hash
by japhy (Canon) on Oct 31, 2001 at 20:15 UTC
    Construct a regex from the hash keys, and do a match with them.
    $pat = join '|', sort { length($b) <=> length($a) } keys %prefix; if ($number =~ /^($pat)/) { print "Prefix: $1\n"; }

      I think that will probably be a whole lot slower than a (few) hash lookup(s) unless you optimize the regex like is dicussed in this thread: SAS log scanner

              - tye (but my friends call me "Tye")
        I'm well aware of that optimization necessity. I'm putting it in my book somewhere.

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Fast lookup for prefixes in hash
by Masem (Monsignor) on Oct 31, 2001 at 18:27 UTC
    I don't believe you can do it with just one, but you can at least you should be able to work backwards:
    my $key = "31456"; # or whatever; my $value = undef; do { $value = $hash{ $key }; # undef if not defined $key = substr( $key, 0, -1 ); } while ( !$value && $key ); # once value is set, this exits with the value # once the key is exhausted, this exists
Re: Fast lookup for prefixes in hash
by jryan (Vicar) on Oct 31, 2001 at 21:41 UTC
Re: Fast lookup for prefixes in hash
by Fastolfe (Vicar) on Nov 01, 2001 at 06:01 UTC
    I think mirod has the most efficient algorithm, trying successively longer substrings. Another implementation:
    sub find_result { my $number = shift; my $prefix_length = length($number); my $result; 1 until defined($result = $hash{substr($number, 0, $prefix_length-- +)}) or $prefix_length <= 0; return $result; }
    This has the added benefit of allowing you to specify more specific mappings that take precedence over more generic. Another possibility might be constructing a tree of sorts, and walking the tree for each digit in your number:
    %hash = ( 1 => { 2 => { 3 => 1, 4 => 2, }, 3 => { 2 => 3, } } ); # Equivalent to ( 123 => 1, 124 => 2, 132 => 3 ) sub find_result { my $number = shift; my $position = 0; my $pointer = \%hash; while (defined(my $current = substr($number, $position++, 1))) { return $current unless ref($pointer->{$current}); $pointer = $pointer->{$current}; } return; }
    Creating this new tree-shaped %hash from a list of prefix => result pairs is an exercise for the reader.
Re: Fast lookup for prefixes in hash
by runrig (Abbot) on Nov 01, 2001 at 08:05 UTC
    One more way. Copy your hash keys to arrays, and do a binary search. The setup is O(n*log(n)), but the advantage is that the lookups are O(log(n)) (Not quite the O(1) as you wanted, but pretty good). (all untested, but pretty much lifted from Array::IntSpan, which I've seen alot of lately :)
    my @array = sort keys %hash; # Keep another array for regexps my @regexes = map qr/^$_/, @array; for my $ph_num (@phone_numbers) { my $num = search($ph_num) || 'not'; print "$ph_num $num found\n" } sub search { my $num = shift; my ($start, $end) = (0, $#array); while ($start < $end) { my $mid = ($start+$end)/2; if ($array[$mid] le $num) { return $hash{$array[$mid]} if $num =~ $regexes[$mid]; $start = $mid+1; } else { $end = $mid-1; } } $num =~ $regexes[$start] && $hash{$array[$start]}; }
    Update: Added slight optimization. We are still occasionally repeating the last regex, that's ok though. I don't think there's an optimal way to eliminate it (or I just don't feel liking looking for it at the moment :).