in reply to comparing two fasta files
Hi again,
I guess that your second file is a library, downloaded from the genebank or from a genome database. And you want to extract from that library some sequences that are of your interst (you got them as a result of some experiment, or blast search, or something). If I am correct, I suposse that you will be repeating the same work several times, searching for different sequences. I think that a good approach would be to create an indexed database for the second file (the library, which will not change), and then search that database by genename.
You can use the Bio::DB::Fasta module (you can read the perldoc of that module, and there are more documentation in BioPerl), which will facilitate you the task, but it would be good that you understand what is going on. In pseudocode, it should be:
use Bio::DB::Fasta get the name of the second_file(library) create the database (use the "new" function of Bio::DB::Fasta while read each line of first_file(your_seqs) get the name look for the corresponding sequence in the db (use a module functi +on)
This is the basic way that I use to do. Maybe there are better ways, but in a few lines of code I have the work done, and it does it fast.
Once you get the basics of Perl, the BioPerl modules will be very helpfull to you.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: comparing two fasta files
by nemo2 (Initiate) on Aug 01, 2008 at 14:58 UTC | |
by graff (Chancellor) on Aug 02, 2008 at 03:01 UTC | |
by nemo2 (Initiate) on Aug 02, 2008 at 14:51 UTC | |
by graff (Chancellor) on Aug 02, 2008 at 19:42 UTC | |
by nemo2 (Initiate) on Aug 02, 2008 at 08:34 UTC | |
by sesemin (Beadle) on Jul 13, 2010 at 18:18 UTC | |
by rogerd (Sexton) on Aug 11, 2008 at 21:56 UTC |