Re^4: Fast(er) serialization in Perl

This is how the hash basically looks (it goes on for about 6 million lines (genes)):

$VAR1 = {
          'microT' => {
                        'mmu-miR-704' => {
                                           'NM_009309' => '1',
                                           'NM_133983' => '1',
                                           'NM_175563' => '1',
                                           'NM_010889' => '1',
                                           'NM_008302' => '1',
                                           'NM_022023' => '1',
                                           'NM_009567' => '1',
                                           'NM_172938' => '1',
                                           'NM_029777' => '3',
                                           'NM_134189' => '1',
                                           'NM_175025' => '1',
                                           'NM_177327' => '1',
                                           'NM_026807' => '1',
                                           'NM_178779' => '3',
                                           'NM_010770' => '1',
                                           'NM_031998' => '1',
                                           'NM_145584' => '2',
                                           'NM_207682' => '1',
                                           'NM_001005525' => '1',
                                           'NM_080853' => '1',
                                           'NM_145519' => '1',
                                           'NM_031249' => '1',
                                           'NM_172923' => '1',
                                           'NM_001008700' => '1',
                                           'NM_198617' => '1',
                                           'NM_027400' => '1',
                                           'NM_026406' => '2',
                                           'NM_021296' => '2',
                                           'NM_027652' => '1',
                                           'NM_001045530' => '1',
                                           'NM_018830' => '1',
                                           'NM_025314' => '1',
                                           'NM_009041' => '1',
                                           'NM_026829' => '3',
                                           'NM_026618' => '1',
                                           'NM_027472' => '1',
                                           'NM_027870' => '1',
                                           'NM_001033239' => '1',
                                           'NM_026348' => '1',
                                           'NM_008223' => '1',
                                           'NM_009595' => '2',
                                           'NM_146094' => '1',
                                           'NM_144945' => '1',
                                           'NM_019510' => '1',
                                           'NM_001033251' => '1',
                                           'NM_001081213' => '3',
                                           'NM_008031' => '1',
                                           'NM_028719' => '1',
                                           'NM_133352' => '1',
                                           'NM_008133' => '1',
                                           'NM_008317' => '1',
                                           'NM_021327' => '1',
                                           'NM_178751' => '1',
                                           'NM_010260' => '1',
                                           'NM_025683' => '1',
                                           'NM_026383' => '1',
                                           'NM_001081367' => '1',
                                           'NM_001033354' => '2',
                                           'NM_026034' => '1',
                                           'NM_173395' => '1',
                                           'NM_010762' => '1',
                                           'NM_024432' => '1',
                                           'NM_175113' => '1',
                                           'NM_001077425' => '1',
                                           'NM_026374' => '1',
                                           'NM_026655' => '1',
                                           'NM_177345' => '1',
                                           'NM_027412' => '1',
                                           'NM_183187' => '1',
                                           'NM_016687' => '1',
                                           'NM_175640' => '1',
                                           'NM_007559' => '1',
                                           'NM_011269' => '1',
                                           'NM_010252' => '1',
                                           'NM_019657' => '1',
[download]

I'm not very familiar with Tie::Hash, so if you can give me a quick heads up on how I can use it to save time it would be great

Comment on Re^4: Fast(er) serialization in Perl Download Code

Replies are listed 'Best First'.
Re^5: Fast(er) serialization in Perl by The Perlman (Scribe) on Apr 11, 2010 at 13:47 UTC
> I'm not very familiar with Tie::Hash, so if you can give me a quick heads up on how I can use it to save time it would be great Wouldn't it be even greater if you try to read the detailled docs and tell us what you don't understand? ;-) Your hash really looks like a perverted array ... try to figure out how many lookups are performed and if they can be grouped in smaller data structures. BTW: If your university prefers PHP but accepts blocking large parts of the RAM (6 million hash entries can easily result in 1GB or more memory consumption) something seems terribly wrong...	[reply]
Re^5: Fast(er) serialization in Perl by The Perlman (Scribe) on Apr 11, 2010 at 14:11 UTC
is 'NM_001005525' different from 'NM_1005525' ? if not you have a serious bug... if yes 90% of your keys have 6 digits putting this data into an array with 1 million entries seems reasonable... resulting in 2 MB of memory consumption if you can limit your values to 64536 numbers (it's a counter isn't it?) the other keys have 9 digits, so it seems you are coding your genes in groups of 3 digits. All of them start with "001" So generally - from what you show - a hash of arrays seems reasonable where the hash key represents the first 3 digits and the array the rest.	[reply]
Re^6: Fast(er) serialization in Perl by mrguy123 (Hermit) on Apr 11, 2010 at 15:16 UTC
OK, I understand what you are saying. I will try this direction, and hope I manage to save some time. Thanks for your help	[reply]
Re^7: Fast(er) serialization in Perl by The Perlman (Scribe) on Apr 11, 2010 at 18:23 UTC
BTW: do you use this hash data read-only? if not, you should care about simultaneous calls of your script. And if your only accessing a relatively "small" number of entries, better chose one of the flat-file DB solutions mentioned above.	[reply]