Hi again,

I guess that your second file is a library, downloaded from the genebank or from a genome database. And you want to extract from that library some sequences that are of your interst (you got them as a result of some experiment, or blast search, or something). If I am correct, I suposse that you will be repeating the same work several times, searching for different sequences. I think that a good approach would be to create an indexed database for the second file (the library, which will not change), and then search that database by genename.

You can use the Bio::DB::Fasta module (you can read the perldoc of that module, and there are more documentation in BioPerl), which will facilitate you the task, but it would be good that you understand what is going on. In pseudocode, it should be:

use Bio::DB::Fasta get the name of the second_file(library) create the database (use the "new" function of Bio::DB::Fasta while read each line of first_file(your_seqs) get the name look for the corresponding sequence in the db (use a module functi +on)

This is the basic way that I use to do. Maybe there are better ways, but in a few lines of code I have the work done, and it does it fast.

Once you get the basics of Perl, the BioPerl modules will be very helpfull to you.


In reply to Re: comparing two fasta files by rogerd
in thread comparing two fasta files by nemo2

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.