I doubt the number of keys is the problem. This script runs fine for me with 100000 keys.... you could have millions if you have the ram.
#!/usr/bin/perl
use warnings;
use strict;
my %hash;
for(1..100000){
$hash{$_}= 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';
}
print "check mem and hit enter\n";
<>;
For 100000 keys I use 14megs. 1000000 keys uses 124megs.
Show us a simplified example which demonstrates the problem.
| [reply] [d/l] |
No, there's no maximum (aside from system limits). 7400*65 doesn't seem nearly big enough to be the problem.
Usually perl is pretty verbose while it's crashing. Does it say anything that might be helpful?
Sometimes I crash a program on purpose to see what it's doing to me (or I did to it):
use Data::Dumper; $Data::Dumper::Indent = $Data::Dumper::Sortkeys
+= 1;
die "hrm ... what is going on?!? giant_thing=" . Dumper(\%giant_t
+hing);
| [reply] [d/l] |
Thanks for your answers, I also didn't really expect the hash to be the problem.
Strangely I had no error output, it just stopped. It runs rather long (~7 days), the break came after 46 hours.
Does the "time" command suppress output?
time perl createBinarySet.pl >LOG 2> ERROR &
Nothing in ERROR
The administrator didn't see anything unusal in the system logs, there was no update/restart etc.
Now I alread wrote so much, maybe you see something in the code. The script is opening a file, for example:
'/home/s0571283/positiveSet/pdb0704071050/pdb/km/pdb1kmc.ent'
hen several new files are created , in this case:
1kmc_A.pdb
1kmc_B.pdb
1kmc_C.pdb
1kmc_D.pdb
1kmc_AB.pdb
1kmc_AC.pdb
1kmc_AD.pdb
...
just contain the line starting with ATOM from original file and the matching chain ( substr($line,21,1) )
#'...or die' removed for better reading
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
my(%hash);
my $startDir = '/home/s0571283/positiveSet/pdb0704071050/pdb/';
&test();
sub test(){
open(POSITIVE_CHAIN_ID, "<POSITIVE_CHAIN_ID");
#Example line:
#1KMC A B C D
while (my $line = <POSITIVE_CHAIN_ID>){
my @chainIDs = split(/ /,$line);
my $pdbID = shift(@chainIDs); #1KMC
$pdbID =~ tr/[A-Z]/[a-z]/;#1KMC -> 1kmc
$hash{$pdbID} = [ @chainIDs ]; #1kmc-> A B C D
}
close(POSITIVE_CHAIN_ID);
}
foreach my $key (keys %hash){print "$key\n";
my $dir = substr($key,1,2); #km
#/home/s0571283/positiveSet/pdb0704071050/pdb/km/pdb1kmc.ent
my $pdbFile = $startDir . $dir . '/' . 'pdb' . $key . '.ent';
open(PDBFILE, "<$pdbFile");
while(my $line = <PDBFILE>){
if($line =~ /^ATOM/){
for my $index ( 0 .. $#{ $hash{$key} } ) {
#look for chain ID, eg: 'A'
if( substr($line,21,1) eq $hash{$key}[$index]){
open(SINGLE, ">>$startDir".$dir.'/'."$key".'_'. "$ha
+sh{$key}[$index].pdb");
print SINGLE $line;
close(SINGLE);
}
if ($index != $#{ $hash{$key} }){
for (my $i = $index+1; $i < $#{$hash{$key}}; $i++){
if( substr($line,21,1) eq $hash{$key}[$index] ||
+substr($line,21,1) eq $hash{$key}[$i]){
open(DOUBLE, ">>$startDir" . $dir . '/' . $key
+ . '_' . $hash{$key}[$index] . "$hash{$key}[$i].pdb");
print DOUBLE $line;
close(DOUBLE);
}
}# for
}# if($index != $#{ $hash{$key} })
}# for my $index ( 0 .. $#{ $hash{$key} } )
}# if($line =~ /^ATOM/)
}# while(my $line = <PDBFILE>)
close(PDBFILE);
}# foreach
The crash happend when it was just printing a 'key'
foreach my $key (keys %hash){print "$key\n";
Could it be that I am running out of memory? I don't have much experience with scripts running for such a long time, can I monitor it (memory usage)?
Thanks a lot for your help so far!
| [reply] [d/l] |
Does the program work better if you feed it less data, i.e. a small manageable dataset? Can you tell whether it is executing all of the lines? Perhaps insert some warn statements and monitor your STDERR to see which is the last to run?
| [reply] [d/l] |
| [reply] |