in reply to Re^2: Fast(er) serialization in Perl
in thread Fast(er) serialization in Perl

>I want to keep the hashes functional

as I told you further down you can use Tie::Hash to keep the interface functional.

IMHO there are no general solutions faster than storable ! *

You need to provide more infos.

If your php is calling your perlscript you should check if you can hold the data structure in memory.

You may also check if you're not running into RAM problems causing massive swappings.

I once speeded up a program just by trasforming a huge hash into a hash of hashes (by halving the keys). Since the system only retrieved the sub-hashes actually needed from disk swap, I had a fantastic speed gain.

If this is transferable to your case is unknown since you do not provide enough infos...

Footnote: (*)

from Storable

"SPEED
The heart of Storable is written in C for decent speed. Extra low-level optimizations have been made when manipulating perl internals, to sacrifice encapsulation for the benefit of greater speed."

IMHO it's evident that you need to invest brain to achieve further speed gains!

Replies are listed 'Best First'.
Re^4: Fast(er) serialization in Perl
by mrguy123 (Hermit) on Apr 11, 2010 at 13:33 UTC
    This is how the hash basically looks (it goes on for about 6 million lines (genes)):
    $VAR1 = { 'microT' => { 'mmu-miR-704' => { 'NM_009309' => '1', 'NM_133983' => '1', 'NM_175563' => '1', 'NM_010889' => '1', 'NM_008302' => '1', 'NM_022023' => '1', 'NM_009567' => '1', 'NM_172938' => '1', 'NM_029777' => '3', 'NM_134189' => '1', 'NM_175025' => '1', 'NM_177327' => '1', 'NM_026807' => '1', 'NM_178779' => '3', 'NM_010770' => '1', 'NM_031998' => '1', 'NM_145584' => '2', 'NM_207682' => '1', 'NM_001005525' => '1', 'NM_080853' => '1', 'NM_145519' => '1', 'NM_031249' => '1', 'NM_172923' => '1', 'NM_001008700' => '1', 'NM_198617' => '1', 'NM_027400' => '1', 'NM_026406' => '2', 'NM_021296' => '2', 'NM_027652' => '1', 'NM_001045530' => '1', 'NM_018830' => '1', 'NM_025314' => '1', 'NM_009041' => '1', 'NM_026829' => '3', 'NM_026618' => '1', 'NM_027472' => '1', 'NM_027870' => '1', 'NM_001033239' => '1', 'NM_026348' => '1', 'NM_008223' => '1', 'NM_009595' => '2', 'NM_146094' => '1', 'NM_144945' => '1', 'NM_019510' => '1', 'NM_001033251' => '1', 'NM_001081213' => '3', 'NM_008031' => '1', 'NM_028719' => '1', 'NM_133352' => '1', 'NM_008133' => '1', 'NM_008317' => '1', 'NM_021327' => '1', 'NM_178751' => '1', 'NM_010260' => '1', 'NM_025683' => '1', 'NM_026383' => '1', 'NM_001081367' => '1', 'NM_001033354' => '2', 'NM_026034' => '1', 'NM_173395' => '1', 'NM_010762' => '1', 'NM_024432' => '1', 'NM_175113' => '1', 'NM_001077425' => '1', 'NM_026374' => '1', 'NM_026655' => '1', 'NM_177345' => '1', 'NM_027412' => '1', 'NM_183187' => '1', 'NM_016687' => '1', 'NM_175640' => '1', 'NM_007559' => '1', 'NM_011269' => '1', 'NM_010252' => '1', 'NM_019657' => '1',
    I'm not very familiar with Tie::Hash, so if you can give me a quick heads up on how I can use it to save time it would be great
      > I'm not very familiar with Tie::Hash, so if you can give me a quick heads up on how I can use it to save time it would be great

      Wouldn't it be even greater if you try to read the detailled docs and tell us what you don't understand? ;-)

      Your hash really looks like a perverted array ...

      try to figure out how many lookups are performed and if they can be grouped in smaller data structures.

      BTW: If your university prefers PHP but accepts blocking large parts of the RAM (6 million hash entries can easily result in 1GB or more memory consumption) something seems terribly wrong...

      is 'NM_001005525' different from 'NM_1005525' ?

      if not you have a serious bug...

      if yes

      90% of your keys have 6 digits putting this data into an array with 1 million entries seems reasonable... resulting in 2 MB of memory consumption if you can limit your values to 64536 numbers (it's a counter isn't it?)

      the other keys have 9 digits, so it seems you are coding your genes in groups of 3 digits. All of them start with "001"

      So generally - from what you show - a hash of arrays seems reasonable where the hash key represents the first 3 digits and the array the rest.

        OK, I understand what you are saying.
        I will try this direction, and hope I manage to save some time.
        Thanks for your help