Hi again,
I guess that your second file is a library, downloaded from the genebank or from a genome database. And you want to extract from that library some sequences that are of your interst (you got them as a result of some experiment, or blast search, or something). If I am correct, I suposse that you will be repeating the same work several times, searching for different sequences. I think that a good approach would be to create an indexed database for the second file (the library, which will not change), and then search that database by genename.
You can use the Bio::DB::Fasta module (you can read the perldoc of that module, and there are more documentation in BioPerl), which will facilitate you the task, but it would be good that you understand what is going on. In pseudocode, it should be:
use Bio::DB::Fasta get the name of the second_file(library) create the database (use the "new" function of Bio::DB::Fasta while read each line of first_file(your_seqs) get the name look for the corresponding sequence in the db (use a module functi +on)
This is the basic way that I use to do. Maybe there are better ways, but in a few lines of code I have the work done, and it does it fast.
Once you get the basics of Perl, the BioPerl modules will be very helpfull to you.
In reply to Re: comparing two fasta files
by rogerd
in thread comparing two fasta files
by nemo2
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |