Fast(er) serialization in Perl

mrguy123 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Fast(er) serialization in Perl by The Perlman (Scribe) on Apr 11, 2010 at 11:47 UTC
How do the keys and values look like? With "millions of keys" chances are high that they are highly uniform. That means you might be able to transform it to an array or array like structure. You might even be able to store/load this array with the help of pack/unpack which would speed-up things significantly. If it's "web based" does it mean it's a CGI? You might want to consider fastcgi or mode-perl to keep the data in memory between different queries.	[reply]
Re^2: Fast(er) serialization in Perl by mrguy123 (Hermit) on Apr 11, 2010 at 12:23 UTC
Thanks Because most of the logic of the program is based on the hashes (and I'm not sure I want to change it just yet) I want to keep the hashes functional, so I'm not sure if if I can use the array option Regarding the web time, because of University guidelines the web based part is actually in PHP :(. However, it is not the bottleneck that needs fixing most urgently The question is, can I keep my giant hashes, and still save time (mostly on loading them)?	[reply]
Re^3: Fast(er) serialization in Perl by The Perlman (Scribe) on Apr 11, 2010 at 12:41 UTC
>I want to keep the hashes functional as I told you further down you can use Tie::Hash to keep the interface functional. IMHO there are no general solutions faster than storable ! * You need to provide more infos. If your php is calling your perlscript you should check if you can hold the data structure in memory. You may also check if you're not running into RAM problems causing massive swappings. I once speeded up a program just by trasforming a huge hash into a hash of hashes (by halving the keys). Since the system only retrieved the sub-hashes actually needed from disk swap, I had a fantastic speed gain. If this is transferable to your case is unknown since you do not provide enough infos... Footnote: (*) from Storable "SPEED The heart of Storable is written in C for decent speed. Extra low-level optimizations have been made when manipulating perl internals, to sacrifice encapsulation for the benefit of greater speed." IMHO it's evident that you need to invest brain to achieve further speed gains!	[reply]
Re^4: Fast(er) serialization in Perl by mrguy123 (Hermit) on Apr 11, 2010 at 13:33 UTC
Re^5: Fast(er) serialization in Perl by The Perlman (Scribe) on Apr 11, 2010 at 14:11 UTC
Some notes below your chosen depth have not been shown here
Re^5: Fast(er) serialization in Perl by The Perlman (Scribe) on Apr 11, 2010 at 13:47 UTC
Re^3: Fast(er) serialization in Perl by The Perlman (Scribe) on Apr 11, 2010 at 13:17 UTC
the web based part is actually in PHP you should benchmark if storable is really your bottleneck, starting a non persistent perl-process takes some time ...	[reply]
Re^4: Fast(er) serialization in Perl by mrguy123 (Hermit) on Apr 11, 2010 at 13:38 UTC
Re^5: Fast(er) serialization in Perl by The Perlman (Scribe) on Apr 11, 2010 at 13:49 UTC
Re^5: Fast(er) serialization in Perl by Marshall (Canon) on Apr 12, 2010 at 22:29 UTC
Re: Fast(er) serialization in Perl by BrowserUk (Patriarch) on Apr 11, 2010 at 16:00 UTC
Like others, I think there is almost certainly a better, (more space efficient and faster to load), method of storing your data, that would be equally if not more efficient for performing your lookups and require minimal changes to your script. The key to transforming your hashes to that form, is more details about the nature of the data. If you were to run this script against your hash structure--fill in the name of your Storable file and redirect the output to another file--and post the output, you might get suggestions for how to perform that transformation: `#! perl -slw use strict; use List::Util qw[ max minstr maxstr ]; use Storable qw[ retrieve ]; my $h = retrieve '/path/to/yourfile'; for my $l1 ( keys %{ $h } ) { for my $l2 ( keys %{ $h->{ $l1 } } ) { printf "$l1->$l2: N: %d minL3: %s maxL3: %s minVal: %d\n", scalar( keys %{ $h->{ $l1 }{ $l2 } } ), minstr( keys %{ $h->{ $l1 }{ $l2 } } ), maxstr( keys %{ $h->{ $l1 }{ $l2 } } ), max( values %{ $h->{ $l1 }{ $l2 } } ); } }` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "I'd rather go naked than blow up my ass"	[reply] [d/l]
Re: Fast(er) serialization in Perl by wfsp (Abbot) on Apr 11, 2010 at 11:40 UTC
Perhaps consider a disk based hash like DBM::Deep.	[reply]
Re: Fast(er) serialization in Perl by The Perlman (Scribe) on Apr 11, 2010 at 12:10 UTC
do all keys have the same importance? You may want to use a cache solution holding only the most frequently used keys. If another key is requested retrieve it from disk and add it to the cache. (Again, if you can linearize your data in an array disk look up can be very fast because you can calculate the offset and set it with the help of seek, ) Realizing this with Tie::Hash would keep the interface to your data structure stable and spare you from any refactorings.	[reply]
Re: Fast(er) serialization in Perl by thezip (Vicar) on Apr 11, 2010 at 14:48 UTC
You might consider using the profiler Devel::NYTProf to assess where the bottlenecks are in your code. It provides verbose indication of how much time has been spent executing code at the subroutine and statement level. What can be asserted without proof can be dismissed without proof. - Christopher Hitchens	[reply]
Re: Fast(er) serialization in Perl by Anonymous Monk on Apr 11, 2010 at 11:07 UTC
BerkeleyDB or DB_File might be faster, test it out	[reply]