The creation of the indexed "db" file is not the main point of the script, but rather just a side effect of using the Bio::DB::Fasta module to build an indexed version of a set of data.
The main point of the script is to read the second input file, and for each ID string in that file, fetch the corresponding sequence data from the Library (by looking up that ID string in the indexed db file), and print the ID string together with its sequence data. This is printed to the terminal screen, but you can use redirection on the command line to have it stored in a separate file. Or you can open an explicit output file and print to that instead, like so:
Of course, my initial suggestion (not using the Bio module) would work also, like this (not tested):use strict; use Bio::DB::Fasta; my $db = new Bio::DB::Fasta( "Library.fa" ) # this should be the bigg +er file or die "Failed to create Fasta DB object on Library.fa\n"; open( IDLIST, "Name.fa" ); # this is the list of ID strings open( IDDATA, ">", "Name-sequences.fa" ) or die "Error opening Name-sequences.fa for output: $!"; while ( <IDLIST> ) { my ($id) = (/^>(\S+)/); # capture the id string (without the init +ial ">") print IDDATA ">$id\n", $db->seq( $id ), "\n"; }
It's just that if your library file happens to hold more data than will fit in RAM as a hash variable, the script will be rather slow to run. But try it out.use strict; my %falib; my $idstr; open( LIB, "Library.fa" ); while (<LIB>) { chomp; if ( /^>(\S+)/ ) { $idstr = $1; } elsif ( /^[acgt]+$/i ) { $falib{$idstr} .= uc(); # convert to upper-case } } open( IDLIST, "Name.fa" ); # this is the list of ID strings open( IDDATA, ">", "Name-sequences.fa" ) or die "Error opening Name-sequences.fa for output: $!"; while ( <IDLIST> ) { my ($id) = (/^>(\S+)/); # capture the id string (without the init +ial ">") print IDDATA ">$id\n", $falib{$id}, "\n" if ( exists( $falib{$id} +)); }
In reply to Re^5: comparing two fasta files
by graff
in thread comparing two fasta files
by nemo2
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |