extract sequence given positions from fasta

Gemchal has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I am a PhD student who is using perl to analyse some genome data. I am trying to write some code to extract a piece of sequence (given start and end co-ordinates ) from a fasta file of one genome. The fasta file format is:- >genome1 ACTGTTACTTGTACCTCAGGGTTTTCTCTTTTTTTACGCGCTCAGTCAGTCCCATG GTGCTGCCTGCATGCGTCAGTCA etc and then i have a text file which is tab-deliminated and has 3 columns, gene number, gene start and gene finish eg 1 0 825 2 837 1000 etc I would like the perl script to output the sequence for each gene to a different file with the gene number as the file name. Please help, even if its just to suggest where to start, i am feeling a bit lost at the moment. I presume i need to make an array of the gene positions, start and finish? but not sure where to go from there. Thanks Gemma

Comment on extract sequence given positions from fasta

Replies are listed 'Best First'.
Re: extract sequence given positions from fasta by umasuresh (Hermit) on Nov 29, 2010 at 15:53 UTC
I recommend reading Beginning Perl for Bioinformatics for beginners. Go through all the exercises as well. Also read up Markup in Monastery for code tags etc.	[reply]
Re: extract sequence given positions from fasta by tospo (Hermit) on Nov 29, 2010 at 16:00 UTC
Yes, you will want to read your sequence ID/positional data into an array, ideally an array of hashes were the hash keys would be something like "id", "start, "end". Then you should probably use Bio::DB::Fasta to retrieve fragments of sequences from your FASTA file, but you could also use Bio::SeqIO and extract regions from your sequences using "substr" on the DNA string.	[reply]