In general: Your problem might not be the keys, but the values of your hash.
#!/usr/bin/perl
use warnings;
use strict;
use Devel::Size qw(size);
my %hash = ();
for my $key (1..27000) {
$hash{"blahblahblahblah$key"} = 1;
}
print size(\%hash);
The keys are really not small here and they also contain a hell of redundant data. But size for the hash speaks of 1736142 bytes. It also has 27.000 entries like yours. I think your optimizing on the wrong part of your data structure.
| [reply] [d/l] |
Example of what am I trying to do and also of the data I'm handling
In generate.pl freq_tuple3 is the 3-key hash I was talking about.
| [reply] |
Use a database if you'd like to work with large datasets. True for any language. | [reply] |
Your data set could include up to 21,952,000,000,000 points if fully populated. Even though you are working with a sparsely populated version of that possible set, the number is still going to be huge. If we assume that each word will only form digraphs with 1/5 of available words, that leaves you with 175,616,000,000 data points to count. You are suffering from a combinatorical explosion.
Zen recommended using a database. I think that is good advice. MLDBM looks like a nice fit.
Update: Calculations based on 28000 words, not 27000. Credited Zen by name for his advice.
| [reply] |
| [reply] |