Re: fasta hash

It's usually more efficient to do it the other way round: First read the file with the IDs, store the IDs in a hash, and then go through the fasta file, and print each line if its ID appears in the hash. That way you have to store less data in memory.

Regarding your code: Use strict and warnings, and indent the code properly, for example 4 characters for each opening bracket. It actually makes code readable. See perlstyle.

@data=split(" ",$line);
$fastahash{$fastaID}=$sequence;
[download]

This is almost certainly wrong: the hash key ($fastaID) doesn't depend on $line, so whatever it is, it's not the current ID. First assign to $fastaID, then use it as a hash key.

Perl 6 - second systems done right

Comment on Re: fasta hash Select or Download Code

Replies are listed 'Best First'.
Re^2: fasta hash by morio56 (Initiate) on Aug 26, 2011 at 13:51 UTC
Thanks. But then what will be the value to the id key since te ids file only contain ids and nothing else?	[reply]
Re^3: fasta hash by ForgotPasswordAgain (Vicar) on Aug 26, 2011 at 22:37 UTC
As moritz said, 1 (or ++) is a common choice of value, but it's not necessarily the best one. It doesn't usually matter these days for 100k elements, but it's better (thanks, Liz! ;) for memory size to do something like this: `my ($undef); while (whatever...) { .... $hash{$key} = $undef; }` [download] This way each element points to the same $undef value. Otherwise, each element would point to a different copy of the value 1. That's a kind of "poor man's aliasing". For bonus points, you might look at Array::RefElem or Data::Alias.	[reply] [d/l]
Re^3: fasta hash by moritz (Cardinal) on Aug 26, 2011 at 13:56 UTC
Whatever you chose it to be. `1` is a common choice. Perl 6 - second systems done right	[reply] [d/l]
Re^4: fasta hash by morio56 (Initiate) on Aug 26, 2011 at 15:12 UTC
I have changed the code, but now my problem seems to be that I can only access the last line of the output outside the loops. I wonder if there's a way to store the variables inside the loop to be accessible outside. The code looks like this now. if(@ARGV < 3){ die "Not enough arguments\n"; } $sequence=""; $fastaID; open(FILE1,"$ARGV[0]") or die "No fasta file provided in command line: + $!\n"; while ($line=<FILE1>){ chomp($line); if ($line=~/^\s$/){ next; }elsif ($line=~/^.$/){ $fastaID=$line; $fastahash{$fastaID}=1; } } open(FILE2,"$ARGV[2]") or die "No fasta file provided in command line: + $!\n"; while($line2=<FILE2>){ chomp($line2); if ($line2=~/^>/){ @data=split(" ",$line2); $fasta=$data[1]; $sequence=""; }else{ $sequence.=$line2; } } if (exists $fastahash{$fasta}){ print "$fastaID\t $sequence\n"; } exit; [download] And the output, which is just the last key value in the fastahash is `2056360013 Musacgagchagshgashcgahcgacacsasasasacsacsasasacacaasc +assacsaascascascascac` [download]	[reply] [d/l] [select]
Re^5: fasta hash by moritz (Cardinal) on Aug 26, 2011 at 16:38 UTC
Re^6: fasta hash by morio56 (Initiate) on Aug 26, 2011 at 19:10 UTC
Some notes below your chosen depth have not been shown here