Re: Extract subset of sequences from a FASTA file

I therefore wanted to create a file that randomly selects 1000 sequences, and then put these in a new FASTA file.

How many sequences are you starting with?
What is their total combined size?
Are they all in a single file, or in separate files?
(say, one per primer)?
How are you reading and writing your files?
Using standard file handling -- readline & print -- or some specialist module?

The reason for all the questions is that some approaches are better than other depending upon your answers.

For example, if you have all the sequences in a single file, then there is a method of making a random selection of those records, without loading them all into memory first.

Which if the combined total is very large -- ie. a few million longish sequences; or a few billion shorter ones -- avoiding having to load them all at once can be very convenient.

Another example. The Bio* FASTA file handling modules aren't very convenient for your purpose because they only allow you to iterate over the sequences sequentially, or access by ID; not randomly. So at the very least you would have to iterate over the file(s) and copy all the ids into a secondary data-structure -- an array say -- in order to make your random selection.

It's also not clear from your description whether you are selecting a single file of 1000 sequences across all the primer sets; or 1000 from each primer set.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Comment on Re: Extract subset of sequences from a FASTA file