Hello Monks, may I request your consummate wisdom on a wee question I have?
I have a hash, where the key is a unique ID tag and the value is the genetic data for the gene corresponding to that ID. In the kinds of analyses I do I am generally dealing with relatively large hashes, maybe ~15000 key/value pairs kinda thing. I want to extract the sequence information for a given subset of genes for which I have the ID's stored in an array. I can do it with the following code, which works pretty well (it uses a bit of BioPerl...):
#!/usr/bin/perl use strict; use warnings; use Bio::Seq; use Bio::SeqIO; my $uniqueFile = $ARGV[0]; my $goodProteinsFile = $ARGV[1]; ## imports a bunch of gene IDs open (FILE, $uniqueFile); my @data_in = <FILE>; close FILE; ## create lookup hash; key = ID, val = sequence my %goodProteins_hash; my $in = Bio::SeqIO->new(-file=>$goodProteinsFile, -format=>'Fasta'); while (my $seq = $in -> next_seq() ) { my $id = $seq -> display_id(); my $seq_string = $seq -> seq(); $goodProteins_hash{$id} = $seq_string; } my $file_out = "strainSpecific_seqData.protein.fasta"; ## iterate thru @data_in; if $id eq $_ then get at the value in ## %goodProteins hash and store it in %strSpec... takes a while! my %strSpec; foreach (@data_in) { chomp ($_); while (my ($id, $seq) = each %goodProteins_hash) { if ($_ =~ /($id)$/) { $strSpec_protein_hash{$id} = $seq; } } } open (OUT, ">strainSpecific_seqData.protein.fasta"); while (my ($k, $v) = each %strSpec) { ## print to file print OUT "\>$k\n$v\n"; } close OUT; print "- Finished\n";
So I am getting to the sequence data for the IDs I want by looping through the '@data_in' array, then using a 'while each' on the %goodProteins_hash followed by an if... Perhaps not surprisingly this takes quite a long time per input ID, and if I want to get out a lot of sequences it takes ages!
So my question is: is there a quicker and more efficient way of doing something like this?? I tried playing around with grep and exists etc but I couldn't get it to do what I wanted...
Your responses, as always, are very much appreciated!
Thanks :-)
In reply to search and extract from a large hash by reubs85
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |