RobertCraven has asked for the wisdom of the Perl Monks concerning the following question:

Masters,

a simple (and probably stupid) question:
Is there a maximum size of a hash? I have a script crashing at a mysterious point, I am wondering if a hash of arrays could be the reason.

The hash contains 7400 keys, the array length varies between 2 and 65.

Thanks a lot!

Replies are listed 'Best First'.
Re: maximum size of hash
by zentara (Cardinal) on May 14, 2007 at 14:48 UTC
    I doubt the number of keys is the problem. This script runs fine for me with 100000 keys.... you could have millions if you have the ram.
    #!/usr/bin/perl use warnings; use strict; my %hash; for(1..100000){ $hash{$_}= 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'; } print "check mem and hit enter\n"; <>;
    For 100000 keys I use 14megs. 1000000 keys uses 124megs. Show us a simplified example which demonstrates the problem.

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
Re: maximum size of hash
by marto (Cardinal) on May 14, 2007 at 14:49 UTC
      Great!

      Thanks a lot!
Re: maximum size of hash
by jettero (Monsignor) on May 14, 2007 at 14:47 UTC
    No, there's no maximum (aside from system limits). 7400*65 doesn't seem nearly big enough to be the problem.

    Usually perl is pretty verbose while it's crashing. Does it say anything that might be helpful? Sometimes I crash a program on purpose to see what it's doing to me (or I did to it):

    use Data::Dumper; $Data::Dumper::Indent = $Data::Dumper::Sortkeys += 1; die "hrm ... what is going on?!? giant_thing=" . Dumper(\%giant_t +hing);

    -Paul

Re: maximum size of hash
by RobertCraven (Sexton) on May 14, 2007 at 15:31 UTC
    Thanks for your answers, I also didn't really expect the hash to be the problem.
    Strangely I had no error output, it just stopped. It runs rather long (~7 days), the break came after 46 hours.

    Does the "time" command suppress output?
    time perl createBinarySet.pl >LOG 2> ERROR &
    Nothing in ERROR

    The administrator didn't see anything unusal in the system logs, there was no update/restart etc.

    Now I alread wrote so much, maybe you see something in the code. The script is opening a file, for example:
    '/home/s0571283/positiveSet/pdb0704071050/pdb/km/pdb1kmc.ent'
    hen several new files are created , in this case:

    1kmc_A.pdb
    1kmc_B.pdb
    1kmc_C.pdb
    1kmc_D.pdb
    1kmc_AB.pdb
    1kmc_AC.pdb
    1kmc_AD.pdb
    ...
    just contain the line starting with ATOM from original file and the matching chain ( substr($line,21,1) )

    #'...or die' removed for better reading #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my(%hash); my $startDir = '/home/s0571283/positiveSet/pdb0704071050/pdb/'; &test(); sub test(){ open(POSITIVE_CHAIN_ID, "<POSITIVE_CHAIN_ID"); #Example line: #1KMC A B C D while (my $line = <POSITIVE_CHAIN_ID>){ my @chainIDs = split(/ /,$line); my $pdbID = shift(@chainIDs); #1KMC $pdbID =~ tr/[A-Z]/[a-z]/;#1KMC -> 1kmc $hash{$pdbID} = [ @chainIDs ]; #1kmc-> A B C D } close(POSITIVE_CHAIN_ID); } foreach my $key (keys %hash){print "$key\n"; my $dir = substr($key,1,2); #km #/home/s0571283/positiveSet/pdb0704071050/pdb/km/pdb1kmc.ent my $pdbFile = $startDir . $dir . '/' . 'pdb' . $key . '.ent'; open(PDBFILE, "<$pdbFile"); while(my $line = <PDBFILE>){ if($line =~ /^ATOM/){ for my $index ( 0 .. $#{ $hash{$key} } ) { #look for chain ID, eg: 'A' if( substr($line,21,1) eq $hash{$key}[$index]){ open(SINGLE, ">>$startDir".$dir.'/'."$key".'_'. "$ha +sh{$key}[$index].pdb"); print SINGLE $line; close(SINGLE); } if ($index != $#{ $hash{$key} }){ for (my $i = $index+1; $i < $#{$hash{$key}}; $i++){ if( substr($line,21,1) eq $hash{$key}[$index] || +substr($line,21,1) eq $hash{$key}[$i]){ open(DOUBLE, ">>$startDir" . $dir . '/' . $key + . '_' . $hash{$key}[$index] . "$hash{$key}[$i].pdb"); print DOUBLE $line; close(DOUBLE); } }# for }# if($index != $#{ $hash{$key} }) }# for my $index ( 0 .. $#{ $hash{$key} } ) }# if($line =~ /^ATOM/) }# while(my $line = <PDBFILE>) close(PDBFILE); }# foreach


    The crash happend when it was just printing a 'key'
    foreach my $key (keys %hash){print "$key\n";
    Could it be that I am running out of memory? I don't have much experience with scripts running for such a long time, can I monitor it (memory usage)?

    Thanks a lot for your help so far!
      Does the program work better if you feed it less data, i.e. a small manageable dataset? Can you tell whether it is executing all of the lines? Perhaps insert some warn statements and monitor your STDERR to see which is the last to run?
        I am reading records from an Excel file into a Perl hash. Many files and hashes are involved, but it appears that Perl runs out of space near the end of some hashes. If I limit the records, then it runs fine.

        Does anyone know of a limit on Perl hashes, or Perl hashes when read from Excel spreadsheets? It seems to die around 670 and stop reading remaining records.

        thanks

Re: maximum size of hash
by Anonymous Monk on Apr 21, 2023 at 10:19 UTC
    What version of perl?