Re: tying a hash from a big dictionary

I use this code to read my dictionary:

You are using far more (double maybe even triple the memory requirement) because of the way you are returning the data from your subroutine.

It may not be enough to relieve your out-of-memory situation, but try this before you seek other more complex and inevitably slower solutions:

sub read_dict{
    my $file = shift;    
    my %dict;
    open( my $fh, "<:encoding(utf5)", $file );
    while( <FILE> ) {
       chomp; ## no need to chomp twice
       my ($p1, $p2) = split /\t/;
       push( @{ $dict{ $p1 } }, $p2 );
    }
    close $fh;
    return \%dict; ## main space saving change; return a ref to the ha
+sh
}

...
my $dict = read_dict( $dict_name );

...

for my $next_phrase ( @{ $dict->{ $key } } ){
    ...
}
[download]

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re: tying a hash from a big dictionary Download Code

Replies are listed 'Best First'.
Re^2: tying a hash from a big dictionary by Anonymous Monk on Oct 31, 2011 at 13:53 UTC
That was a nice one thanks! Although I still have memory problem, but this tip saved me a lot as well!	[reply]
Re^3: tying a hash from a big dictionary by BrowserUk (Patriarch) on Oct 31, 2011 at 13:56 UTC
How many lines has your file? How many of those are you succeeding in loading before you run out of memory? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^4: tying a hash from a big dictionary by Anonymous Monk on Oct 31, 2011 at 14:06 UTC
I have around 200m lines. I don't know after how many lines I go out of memory since I haven't calculated yet.	[reply]
Re^5: tying a hash from a big dictionary by BrowserUk (Patriarch) on Oct 31, 2011 at 14:10 UTC
Re^6: tying a hash from a big dictionary by Anonymous Monk on Oct 31, 2011 at 14:54 UTC
Some notes below your chosen depth have not been shown here
Re^3: tying a hash from a big dictionary by Anonymous Monk on Oct 31, 2011 at 14:55 UTC
on a 4gb machine, it will run out of memory after 5m of dictionary lines.	[reply]