in reply to A better (problem-specific) perl hash?
Using the basic idea that if you pack 2-bits for each character together to form a number, your 12-character words' can be represented by an number 0 .. 2**24, you can use that to index into an array of packed integers each requiring 4 bytes, you arrive at a storage requirement of 2**28 bytes/256 MB to cover your 12-bytes words.
The transforming the words into integers can be done like this, (the quickest of 4 methods I tried):
use List::Util qw[ reduce ]; sub xform{ my $word = shift; $word =~ tr[ACGT][\x00\x01\x02\x03]; reduce{ ($a << 2) + $b } 0, unpack 'C*', $word; }
Then you need an implementation of a packed integer array. I tried a tied array, which has the convenience of standard array notation, $index[ xform( $word ) ]++;, but proves to be rather slow. Converting that to an OO representation was about 20% faster:
package OO::Index; sub new { my $self; open RAM, '>', \$self; seek RAM, 2**28, 0; print RAM chr(0); close RAM; return bless \$self, $_[0]; } sub get { my( $self, $idx ) = @_; return unpack 'V', substr $$self, $idx*4, 4; } sub set { my( $self, $idx, $value ) = @_; substr $$self, $idx*4, 4, pack 'V', $value; return $value; } 1;
A crude benchmark over 1e6 random 12-char keys shows this to be about 40% slower than a hash, but requires only 20% of the memory:
#! perl -sw use strict; package OO::Index; sub new { my $self; open RAM, '>', \$self; seek RAM, 2**28, 0; print RAM chr(0); close RAM; return bless \$self, $_[0]; } sub get { my( $self, $idx ) = @_; return unpack 'V', substr $$self, $idx*4, 4; } sub set { my( $self, $idx, $value ) = @_; substr $$self, $idx*4, 4, pack 'V', $value; return $value; } return 1 if caller; package main; use Benchmark qw[ cmpthese ]; use List::Util qw[ reduce ]; $a = $b; sub rndStr{ join'', @_[ map{ rand @_ } 1 .. shift ] } sub xform{ my $word = shift; $word =~ tr[ACGT][\x00\x01\x02\x03]; reduce{ ($a << 2) + $b } 0, unpack 'C*', $word; } cmpthese 5, { hash => q[ my %index; $index{ rndStr 12, qw[ A C G T ] }++ for 1 .. 1e6; ], oo => q[ my $index = new OO::Index; for ( 1 .. 1e6 ) { my $i = xform rndStr 12, qw[ A C G T ]; $index->set( $i, 1+ $index->get( $i ) ); } ], }; __END__ C:\test>junk s/iter oo hash oo 35.3 -- -58% hash 15.0 136% --
HTH
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: A better (problem-specific) perl hash?
by adamk (Chaplain) on Feb 18, 2006 at 15:31 UTC | |
by BrowserUk (Patriarch) on Feb 18, 2006 at 16:23 UTC | |
by BrowserUk (Patriarch) on Feb 18, 2006 at 17:31 UTC |