in reply to Fast(er) serialization in Perl

How do the keys and values look like?

With "millions of keys" chances are high that they are highly uniform.

That means you might be able to transform it to an array or array like structure.

You might even be able to store/load this array with the help of pack/unpack which would speed-up things significantly.

If it's "web based" does it mean it's a CGI? You might want to consider fastcgi or mode-perl to keep the data in memory between different queries.

Replies are listed 'Best First'.
Re^2: Fast(er) serialization in Perl
by mrguy123 (Hermit) on Apr 11, 2010 at 12:23 UTC
    Thanks
    Because most of the logic of the program is based on the hashes (and I'm not sure I want to change it just yet) I want to keep the hashes functional, so I'm not sure if if I can use the array option
    Regarding the web time, because of University guidelines the web based part is actually in PHP :(.
    However, it is not the bottleneck that needs fixing most urgently
    The question is, can I keep my giant hashes, and still save time (mostly on loading them)?
      >I want to keep the hashes functional

      as I told you further down you can use Tie::Hash to keep the interface functional.

      IMHO there are no general solutions faster than storable ! *

      You need to provide more infos.

      If your php is calling your perlscript you should check if you can hold the data structure in memory.

      You may also check if you're not running into RAM problems causing massive swappings.

      I once speeded up a program just by trasforming a huge hash into a hash of hashes (by halving the keys). Since the system only retrieved the sub-hashes actually needed from disk swap, I had a fantastic speed gain.

      If this is transferable to your case is unknown since you do not provide enough infos...

      Footnote: (*)

      from Storable

      "SPEED
      The heart of Storable is written in C for decent speed. Extra low-level optimizations have been made when manipulating perl internals, to sacrifice encapsulation for the benefit of greater speed."

      IMHO it's evident that you need to invest brain to achieve further speed gains!
        This is how the hash basically looks (it goes on for about 6 million lines (genes)):
        $VAR1 = { 'microT' => { 'mmu-miR-704' => { 'NM_009309' => '1', 'NM_133983' => '1', 'NM_175563' => '1', 'NM_010889' => '1', 'NM_008302' => '1', 'NM_022023' => '1', 'NM_009567' => '1', 'NM_172938' => '1', 'NM_029777' => '3', 'NM_134189' => '1', 'NM_175025' => '1', 'NM_177327' => '1', 'NM_026807' => '1', 'NM_178779' => '3', 'NM_010770' => '1', 'NM_031998' => '1', 'NM_145584' => '2', 'NM_207682' => '1', 'NM_001005525' => '1', 'NM_080853' => '1', 'NM_145519' => '1', 'NM_031249' => '1', 'NM_172923' => '1', 'NM_001008700' => '1', 'NM_198617' => '1', 'NM_027400' => '1', 'NM_026406' => '2', 'NM_021296' => '2', 'NM_027652' => '1', 'NM_001045530' => '1', 'NM_018830' => '1', 'NM_025314' => '1', 'NM_009041' => '1', 'NM_026829' => '3', 'NM_026618' => '1', 'NM_027472' => '1', 'NM_027870' => '1', 'NM_001033239' => '1', 'NM_026348' => '1', 'NM_008223' => '1', 'NM_009595' => '2', 'NM_146094' => '1', 'NM_144945' => '1', 'NM_019510' => '1', 'NM_001033251' => '1', 'NM_001081213' => '3', 'NM_008031' => '1', 'NM_028719' => '1', 'NM_133352' => '1', 'NM_008133' => '1', 'NM_008317' => '1', 'NM_021327' => '1', 'NM_178751' => '1', 'NM_010260' => '1', 'NM_025683' => '1', 'NM_026383' => '1', 'NM_001081367' => '1', 'NM_001033354' => '2', 'NM_026034' => '1', 'NM_173395' => '1', 'NM_010762' => '1', 'NM_024432' => '1', 'NM_175113' => '1', 'NM_001077425' => '1', 'NM_026374' => '1', 'NM_026655' => '1', 'NM_177345' => '1', 'NM_027412' => '1', 'NM_183187' => '1', 'NM_016687' => '1', 'NM_175640' => '1', 'NM_007559' => '1', 'NM_011269' => '1', 'NM_010252' => '1', 'NM_019657' => '1',
        I'm not very familiar with Tie::Hash, so if you can give me a quick heads up on how I can use it to save time it would be great
      the web based part is actually in PHP

      you should benchmark if storable is really your bottleneck, starting a non persistent perl-process takes some time ...

        The retrieval of the hash takes 12 seconds (out of less than 20 secs overall) so if I can take that number down a bit, I'm happy