If I go to my command-line prompt and type: perl DNA_sequences.pl NC_001666 rps12, my program will (or should) go into NC_001666.txt and figure out the range of bases that the gene rps12 spans.#!/usr/bin/perl ($root,$gene) = (@ARGV); open(TXT,$root.".txt") || die "Cannot open $root.txt"; open(FASTA, $root.".fasta") || die "Cannot open $root.fasta"; $found=0; while (<TXT>) { ($start,$stop) = m/^$gene \((\d+)..(\d+)\)/io || next; $found=1; last; } die "Did not find gene $gene in $root.txt" unless $found=1; $found = 0; while (<FASTA>) { chop; @x=split; if ($x[0] >= $start) { # start-logic here; } } while (<FASTA>) { chop; @x=split; if ($x[0] <= $stop) { # stop-logic here } } # print logic here
If it is, for example, from base 91772 to 92242, then the program will go into the fasta file and pull out all bases from the 91772nd to the 92242nd, inclusively. If in the text file, there is a negative sign in front of the range (like in this one), then the order of the bases printed out should be reversed, and all a's should be converted to t's, and all c's to g's and vice versa. Im sure a lot of people here already know this, but every fasta file has a header that is the entire first line, and then the bases start on the next line. If you would like to see one, here's the fasta file link for species NC_001666...rps12 -(92301..93101) rps7 -(91772..92242) ndhB -(89236..91472) trnL -(88584..88664) trnI -(84881..84954) rpl23 -(84433..84714) rpl2 -(82930..84414)
Everything after the $found=0 before the while(<FASTA>) is just me guessing on how to start that part, and I didn't know where else to go with it. The top part of the code with getting the start and stop range from the text file works pretty well, I think. I also just need to figure out how to reverse the output like I said when there's a negative sign. Thanks for taking the time to look at this...I got it started, but then was at a loss for how to get the bases out of the fasta file and only print out the certain range of them I need depending on the given species and gene. -statsman5http://www.ncbi.nlm.nih.gov/nuccore/11994090?report=fasta&log$=seqview
In reply to Extracting DNA sequences from FASTA files by statsman5
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |