The output of friendoffriend.pl is a comma separated list of strings. Basically, if I've already found the data in a previous run of friendoffriend.pl, I do not want to run friendoffriend.pl on that data. The MySQL table AllJoinRecip contains more than 1 million records. So, I need an efficient way to store and search the data I have already found so that I do not run friendoffriend.pl on that data. As you can see, I have been using a hash then looking up whether the key exists in the hash. Running this program took about 3 hours 30 minutes to process all of the data. friendoffriend.pl runs very quickly, so I'm wondering if using a hash to store the information is slowing down the program. So, will this large hash significantly impact execution time? If so, what are some alternatives that could speed up my execution time? I have MySQL at my disposal, so feel free to offer suggestions that utilize MySQL. I can also modify the AllJoinRecip table as necessary, as this table can very easily be recreated (it takes about 5 minutes to create, though). Any other optimization suggestions are greatly appreciated. Thanks! -gunrmy $sth = $dbh->prepare("SELECT qseqid FROM AllJoinRecip"); $sth->execute() or die("execute failed " . $sth->errstr()); my $i = 1; while(my $seq = $sth->fetchrow()) { next if(exists $processed{$seq}); my $out = `./friendoffriend.pl "$seq"`; my @results = split(',', $out); $processed{$_} = 1 foreach (@results); print "Cluster $i: $out\n"; $i++; }
In reply to Efficiency of a Hash with 1 Million Entries by gunr
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |