in reply to maximum size of hash

Thanks for your answers, I also didn't really expect the hash to be the problem.
Strangely I had no error output, it just stopped. It runs rather long (~7 days), the break came after 46 hours.

Does the "time" command suppress output?
time perl createBinarySet.pl >LOG 2> ERROR &
Nothing in ERROR

The administrator didn't see anything unusal in the system logs, there was no update/restart etc.

Now I alread wrote so much, maybe you see something in the code. The script is opening a file, for example:
'/home/s0571283/positiveSet/pdb0704071050/pdb/km/pdb1kmc.ent'
hen several new files are created , in this case:

1kmc_A.pdb
1kmc_B.pdb
1kmc_C.pdb
1kmc_D.pdb
1kmc_AB.pdb
1kmc_AC.pdb
1kmc_AD.pdb
...
just contain the line starting with ATOM from original file and the matching chain ( substr($line,21,1) )

#'...or die' removed for better reading #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my(%hash); my $startDir = '/home/s0571283/positiveSet/pdb0704071050/pdb/'; &test(); sub test(){ open(POSITIVE_CHAIN_ID, "<POSITIVE_CHAIN_ID"); #Example line: #1KMC A B C D while (my $line = <POSITIVE_CHAIN_ID>){ my @chainIDs = split(/ /,$line); my $pdbID = shift(@chainIDs); #1KMC $pdbID =~ tr/[A-Z]/[a-z]/;#1KMC -> 1kmc $hash{$pdbID} = [ @chainIDs ]; #1kmc-> A B C D } close(POSITIVE_CHAIN_ID); } foreach my $key (keys %hash){print "$key\n"; my $dir = substr($key,1,2); #km #/home/s0571283/positiveSet/pdb0704071050/pdb/km/pdb1kmc.ent my $pdbFile = $startDir . $dir . '/' . 'pdb' . $key . '.ent'; open(PDBFILE, "<$pdbFile"); while(my $line = <PDBFILE>){ if($line =~ /^ATOM/){ for my $index ( 0 .. $#{ $hash{$key} } ) { #look for chain ID, eg: 'A' if( substr($line,21,1) eq $hash{$key}[$index]){ open(SINGLE, ">>$startDir".$dir.'/'."$key".'_'. "$ha +sh{$key}[$index].pdb"); print SINGLE $line; close(SINGLE); } if ($index != $#{ $hash{$key} }){ for (my $i = $index+1; $i < $#{$hash{$key}}; $i++){ if( substr($line,21,1) eq $hash{$key}[$index] || +substr($line,21,1) eq $hash{$key}[$i]){ open(DOUBLE, ">>$startDir" . $dir . '/' . $key + . '_' . $hash{$key}[$index] . "$hash{$key}[$i].pdb"); print DOUBLE $line; close(DOUBLE); } }# for }# if($index != $#{ $hash{$key} }) }# for my $index ( 0 .. $#{ $hash{$key} } ) }# if($line =~ /^ATOM/) }# while(my $line = <PDBFILE>) close(PDBFILE); }# foreach


The crash happend when it was just printing a 'key'
foreach my $key (keys %hash){print "$key\n";
Could it be that I am running out of memory? I don't have much experience with scripts running for such a long time, can I monitor it (memory usage)?

Thanks a lot for your help so far!

Replies are listed 'Best First'.
Re^2: maximum size of hash
by ff (Hermit) on May 15, 2007 at 00:31 UTC
    Does the program work better if you feed it less data, i.e. a small manageable dataset? Can you tell whether it is executing all of the lines? Perhaps insert some warn statements and monitor your STDERR to see which is the last to run?
      I am reading records from an Excel file into a Perl hash. Many files and hashes are involved, but it appears that Perl runs out of space near the end of some hashes. If I limit the records, then it runs fine.

      Does anyone know of a limit on Perl hashes, or Perl hashes when read from Excel spreadsheets? It seems to die around 670 and stop reading remaining records.

      thanks

        Does anyone know of a limit on Perl hashes

        The very first reply in this entire thread is correct. There is no limit other than the space your system might have to store it.

        when read from Excel spreadsheets? It seems to die around 670

        Save your spreadsheet as CSV and read from that instead. Might save a lot of trouble and would also help you prepare the SSCCE that would be required to dig further.


        🦛