"I would like to pull out all of the full length CDS with some 5' and 3' UTR info for the proteins I have found in the tissue by mass spec"

If I understand you correctly, I will look at the gff/gtf coordinates for the CDs feature and try to obtain that for the specific protein annotation in the gff file, A fastA file may not have these features handy so it maybe not the way to go unless you have an annotated genome somewhere for you to extract this stuff from.

"I have tried simply blasting the peptides against the assembly but I get hits that are not in open reading frames"

Maybe because BLAST is a local alignment and search tool, that behavior is totally expected.

" I would suppose the major difference is that I need to return all the ORFs from tens of thousands of entries as a new fasta file."

Read the file one line at a time. Maybe something like what follows ?... If you don't have BioPerl installed then certainly there are many other examples on reading fastA file that don't use BioPerl around here

#UNTESTED CODE TO DEMONSTRATE A WAY AROUND use strict; use warnings; use Bio::SeqIO; my $in = Bio::SeqIO->new(-file =>"FastA.fa", format = "FASTA"); while(my $seq = $in->next_seq){ my $sequence = $seq->seq; #Do something with $sequence my $ORFpattern = "foo"; if($sequence =~ /$ORFpattern/){ #report or anything } }

"Hi Pearl Monks"

we are Perl Monks and not Pearl Monks...

Welcome to the Monastery and good luck....


David R. Gergen said "We know that second terms have historically been marred by hubris and by scandal." and I am a two y.o. monk today :D, June,12th, 2011...

In reply to Re: How do I extract ORFs from a fasta file into a new fasta file by biohisham
in thread How do I extract ORFs from a fasta file into a new fasta file by Wasp_Guy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.