in reply to Searching array against hash
Hi, out of curiosity, as I work with fasta files a bit, why not use the Bio::DB::Fasta module from BioPerl for indexed access to any number of fasta files. Your example is precisely what this module is for, unless you're exercising OO programming. I used your sample data for the code below, if interested.
update: Initially, creating the db may take a moment depending on size of input files, but it works pretty quick every time it's accessed after that.
#!/usr/bin/perl use Modern::Perl '2011'; use autodie; use Bio::DB::Fasta; die "Usage: $0 <fastaFile> <queryFile>\n" unless my( $fastaFile, $quer +yFile ) = @ARGV; open my $fh, '<', $queryFile; chomp( my @queries = <$fh> ); close $fh; # trim header @queries = map{ local $_ = $_; s/^>(\S+)\s+\d+/$1/; $_ } @queries; # create indexed database from fasta file my $db = Bio::DB::Fasta->new( $fastaFile ); open my $fh2, '>', 'querySeqs.fa'; for my $query ( @queries ) { my $seqheader = $db->header( $query ); my $seqdna = $db->seq( $seqheader ); say {$fh2} ">$seqheader"; print {$fh2} format_sequence( $seqdna, 60 ); } close $fh2; sub format_sequence { my($sequence, $length) = @_; # sequence and length of line open my $string_fh, '>', \ my $string; # print sequence to string for( my $pos = 0; $pos < length( $sequence ); $pos += $length ) { print {$string_fh} substr( $sequence, $pos, $length ), "\n"; } $string; } output in file querySeqs.fa >Chr1.ID=PAC:19650373;Name=AT1G11900.1;pacid=19650373;longest=1;Parent +=AT1G11900.len361.i1.1_111 GTATGAATTCCAAAAATCCAGAACCGTTTTCGTGATTCATGTTATGCTCTCGTTGTTGTT TTCTGATTGTTACTGCTCAGCGAGTTTCTTCTATCAATGTTTGATTCGATGAAGATGCGA AATTTCGAACCATTGCTGTTCTTCTGAGTTTGATCGTTTTTTAGTTTCGGGGCTTTCACG CTTCAGCTAGTGTTTGATTACGAAGTTTTCTGATTAAATGTGTGAGTTTTTTTGTAGTCA TCTCGAAAACTGAGAAATCCATTTTTATAGATTTACATTGTTCATAGTTATATGTGGAAG TTGATGATTGATGGTGATTCTGCAAATTGATGATTTGGTTTTCTGTTTATTGGCATTGCA G >Chr1.ID=PAC:19657430;Name=AT1G76660.1;pacid=19657430;longest=1;Parent +=AT1G76660.len490.i1.2_394 GTTTCTGTTTAATTCTCTCTATTTTCGTTTGATTTCGACTTCTTGAGCTTTTACTTCTCT CTGTCTCAGTTCTAACTTCTTCAGATTTTAAAGCTTTCGTTTTTTTGGCAAGTTGTTTTT TTTTCCTACTTAGATCTGACTACTCCGACTCTGTTCACACTAATGTTCGTTAGGGTTTAT GTTGAATCTCTCCTTTGATCATTATGTTATTGTAAAAATCCCAGCTTTATGCTAAATCGA GCTAGTGATTCTTGAGAATTGAACAAAAAAGTTTTACATTTTTCTGAATTGCCATTCAAT TAGAAGAAGAAAAAATTCAACCTTTTACTGGTTATGATCTAGATTCGATGCGTGTAAGCT ATAAGATCACCATTTCGTGCTTTAGATCCATAATCATTGATTCACTATATGGCAATTATC TTCTTGCTTCACAGATCTCTTTTACACTTACATGTCAAGTGTCTGAGTGTGTGTGTGTGT CCTTTTGCAG >Chr1.ID=PAC:19652608;Name=AT1G31420.1;pacid=19652608;longest=1;Parent +=AT1G31420.len415.i1.13_217 GTTCTTATAATTCTTACTAATTCTGTTTGCTTTTAAAGCTATAACCTTTGATATTGTTGG AAAGAGTGGGGTTTTGGGTCTTTTGCTGAATCGTTTTTTGGATTTGTTATATTGTTCGAA TCTTCAGTTGTTATTGTGTTATAGGATTGGATGGTATCTGGGAATTTCGTAATCTATGTT AAGCTAGAGCTGTTTTTGAGCTCTTTTGTTGATGATGATTTTGAGATTGTTGGCCGAATT TAGCTCTCGTTTTCTGATTTTAGCAATTGGAAAGTGTGTATTGGTTCTTGTGAGGCAATT TCACTGTTTTGAGTACTCAAAATGTAGATGAGAGCATGCATAAGTTGTGTGGAGACTGAG CTTAATGTGTAGTGTAATTGACAATTAGTTTTGTGGGCTTTCCTTTGTTTTTCAG
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Searching array against hash
by BrowserUk (Patriarch) on Aug 22, 2013 at 01:55 UTC | |
by abualiga (Scribe) on Aug 22, 2013 at 02:41 UTC | |
by BrowserUk (Patriarch) on Aug 22, 2013 at 03:23 UTC | |
by bioinformatics (Friar) on Aug 22, 2013 at 02:25 UTC | |
by BrowserUk (Patriarch) on Aug 22, 2013 at 03:18 UTC | |
by bioinformatics (Friar) on Aug 22, 2013 at 03:31 UTC | |
by BrowserUk (Patriarch) on Aug 22, 2013 at 04:22 UTC |