Tie one hash two ways?

MrMadScience has asked for the wisdom of the Perl Monks concerning the following question:

I encountered an interesting situation in one of our hash uses. We are using BerkeleyDB::Hash to tie our hashes to files so we can access them "randomly" (over the web). However, some of the people that read these will, on occasion, forget to enter their query in the right case. I was looking at using Hash::Case::Preserve to tie the hash to allow for case-insensitive matching.
The best way I can do this so far is to do something like:

tie my %seqindex, "BerkeleyDB::Hash",
           -Filename => $globals{seq_xref},
           -Flags    => DB_RDONLY
  or goto bail;

tie my %alt_seqindex, 'Hash::Case::Preserve';
foreach my $key (keys %seqindex){
  $alt_seqindex{$key} = $seqindex{$key};
};
[download]

which means that the hash %alt_seqindex is the one I end up doing most of my searches on. It seems that building this second hash takes a very long time. I was wondering if there is a way to make that first tie statement use both the BerkeleyDB and Hash::Case::Preserve modules so that it ends up with a hash that is searchable in a case-insensitive manner?
Or if there are any other ways to do this quicker?
The hash value $globals{$seq_xref} refers to the name of the hash file that we're reading in.

Thank you

Comment on Tie one hash two ways? Download Code

Replies are listed 'Best First'.

Re: Tie one hash two ways?
by Jenda (Abbot) on Oct 01, 2004 at 17:17 UTC

Do you need to preserve the case of the keys? It might be easiest to convert all keys to lowercase in the hash and then lowercase the keys whenever you do any lookup.

If you do have to keep the case you might create a lookup hash that would just map from the lowercased keys to the original ones and then do a two-step lookup:

my %alt_seqindex;
$alt_seqindex{lc $key} = $key while (my ($key) = each %seqindex);
[download]

$value = $seqindex{$alt_seqindex{lc $key}};
[download]

package BerkeleyDB::Hash::CasePreserve;
use BerkeleyDB;
use base 'BerkeleyDB::Hash';

my %LCKeys; # to keep the lc $key => $key mapping for all hashes
sub TIEHASH {
    my $obj = &BerkeleyDB::Hash::TIEHASH;
    my %lckeys;
    my $key = $obj->FIRSTKEY();
    $lckeys{lc $key} = $key;
    $lckeys{lc $key} = $key while $key = $obj->NEXTKEY($key);
    $LCKeys{$obj->[0]} = \%lckeys;
    return $obj;
}

sub FETCH {
    my $self = shift;
    my $key = shift();
    $key = $LCKeys{$self->[0]}->{lc $key}
        or return;
    return $self->SUPER::FETCH($key);
}

sub EXISTS {
    my $self = shift;
    my $key = shift();
    $key = $LCKeys{$self->[0]}->{lc $key}
        or return;
    return $self->SUPER::EXISTS($key);
}

sub STORE {
    my $self = shift;
    my ($key, $value) = @_;
    if (exists $LCKeys{$self->[0]}->{lc $key}) {
        $key = $LCKeys{$self->[0]}->{lc $key};
    } else {
        $LCKeys{$self->[0]}->{lc $key} = $key;
    }
    return $self->SUPER::STORE($key, $value);
}

sub DESTROY {
    my $self = shift();
    delete $LCKeys{$self->[0]};
}

1;
[download]

Fletch

Jenda
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
-- Rick Osborne

[reply]
[d/l]
[select]

Re: Tie one hash two ways?
by Fletch (Bishop) on Oct 01, 2004 at 16:13 UTC

Probably the simplest thing would be to subclass BerkeleyDB::Hash and override the FETCH() and STORE() methods something like (untested):

package BerkeleyDB::Hash::CasePreserve;

sub FETCH {
  my( $self, $key ) = @_;

  unless( $self->EXISTS( $key ) ) {
    my $key_re = qr/\A$key\z/i;
    my $check = $self->FIRSTKEY( );
    do { 
      if( $check =~ $key_re ) {
        $key = $check;
        last;
      }
    } while( $check = $self->NEXTKEY( $check ) );
  }

  return $self->SUPER::FETCH( $key );
}
[download]

Update: Added anchored $key_re.

[reply]
[d/l]
[select]

Re: Tie one hash two ways?
by PodMaster (Abbot) on Oct 02, 2004 at 15:05 UTC

which means that the hash %alt_seqindex is the one I end up doing most of my searches on. It seems that building this second hash takes a very long time

Here's what I'd do (assumes your keys don't end in "_\0_key")


package BerkeleyDB::Hash::CasePreserve;

use BerkeleyDB;
use base qw[ BerkeleyDB::Hash ];

sub FETCH { $_[0]->SUPER::FETCH( lc $_[1] ) }
sub EXISTS { $_[0]->SUPER::EXISTS( lc $_[1] ) }
sub NEXTKEY {
    my $next = "";
    do {
        $next = $_[0]->SUPER::NEXT($_[1]);
    } until
        0 < index
               $next,
               "_\0_key",
                - length "_\0_key";

    return $_[0]->SUPER::FETCH( $next ); # oRiGiNal cAsE
}

sub STORE {
    $_[0]->SUPER::STORE(
        lc($_[1])."_\0_key",
        $_[1],
    ); # oRiGiNaL cAsE

    return $_[0]->SUPER::STORE( $_[1], $_[2] );
}
[download]

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

[reply]
[d/l]