It's not my program/code that is consuming the memory. It's a system call to a C program called muscle. My script comes nowhere near the memory requirements that muscle does. My script uses a simple hash strategy to find sequences (that are similar to one another and likely to yield positive results) to decide which sets of sequences to send together in a call to muscle. An exhaustive approach which takes very little memory would be to run muscle for every likely pair. I have code that does this and it takes a few orders of magnitude longer to do it that way than to run them all together and parse the results. However, muscle runs faster on large groups of sequences than it does for a full ("likely") pairwise comparison. But if I group too many of the sequences together, I hit memory limitations and things start crashing. So I would like to be able to compute a maximum number of sequences to run per gig of memory and break up my analysis as much as it takes to meet that limit so that I get the best performance.
Rob
Comment on Re^2: How to Accurately Determine Amount of RAM