in reply to Re^8: Memory Usage in Regex On Large Sequence
in thread Memory Usage in Regex On Large Sequence

If you're running into memory issues after trying to load the entire chromosome, could you simply split the sequences up into smaller fragments (e.g., <= 200,000 bp each) before searching them for motifs? This is a common technique in biological sequence analysis, and it may help you spread the load a bit more evenly across threads. If you take this approach, make sure you overlap the fragments so you don't miss any motifs at the junction. Finally, if you're doing this repeatedly, you can save the fragments as separate files so you only have to process them once (the offset of each fragment can be included in the filename for easy reference).

HTH

  • Comment on Re^9: Memory Usage in Regex On Large Sequence