rolandomantilla has asked for the wisdom of the Perl Monks concerning the following question:

I have a FASTA array and I need to make it into a hash in order to look for an specific ID and get that specific sequence, any ideas this is what I have until now and i know im not good with Perl by trying hard

#!/usr/local/bin/perl #the file name $filename= hwk2.seq; use strict; use warnings; use lib '/class/bi617a/share/textbook'; use BeginPerlBioinfo; #Declare initialize the variables my @file_data=(); my $sequence= ''; my $query= ''; my %hash= ''; #Read the contents of the file hwk3.seq @file_data=get_file_data("hwk2.seq"); #Extract the sequence data from the content; %hash= map {$_ => 1 } @file_data; if (exists $hash{$query}){ print "This is sequence: $sequence"; }else { print "Not and ID"}; exit; sub get_file_data{ my($filename) = @_; use strict; use warnings; unless ( open (GET_FILE_DATA, $filename) ) { print STDERR "Cannot open file \"$filename\"\n\n"; exit; } @filedata= <GET_FILE_DATA>; close GET_FILE_DATA; return @filedata; }

Replies are listed 'Best First'.
Re: New to Perl
by Utilitarian (Vicar) on Aug 04, 2011 at 04:38 UTC
    Two One things catches my eye straight away....
    • Where do you assign a value to $query?
    • To define %hash you need to map {$hash{$_}=1}@file_data
    One final issue you should use strict and use warnings to catch typos and to ensure you have scoped your variables

    Update corrected my half asleep error, too little testing of code and coffee

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
      To define %hash you need to map {$hash{$_}=1}@file_data

      Did the querent update the OP? This works just fine (and is less abusive):

      my %hash = map { $_ => 1 } @file_data;
        The $query is suppose to be the key for the hash which will be the the ID from the FASTA sequence
Re: New to Perl
by Marshall (Canon) on Aug 05, 2011 at 15:08 UTC
    There must be some sub contained with all the bio Perl modules that does this, but the format is simple and here is a parser for you. I didn't parse the older format that starts with ;. But you can add that if needed. See Wiki FASTA format. Blank lines between sequences are optional as well as ending a sequence with a '*';
    #!/usr/bin/perl -w use strict; use Data::Dump qw(pp); my %sequences; my $line; my $skip_read=0; while ($skip_read or defined ($line = <DATA>) ) { chomp $line; my ($id) = $line =~ /^\>(\w+)/; ($skip_read, $line) = finish_record($id,\%sequences) if ( defined +$id); } sub finish_record { my ($id, $seqHashRef) = @_; my $line; while (defined ($line = <DATA>) and $line !~ m/^\s*$/ and $line !~ m/^\>/) { chomp $line; if ($line =~ /\*$/) { $line =~ s/\*$//; $seqHashRef->{$id}.= $line; return 0; } $seqHashRef->{$id}.= $line; } print "$line\n" if defined $line; return (1, $line) if (defined ($line) and $line =~ m/^\>/); return 0; } print pp(\%sequences); =prints { MCHU => "ADQLTEEQIAEFKEAFSLFDKDGDGT....", gi => "LCLYTHIGRNIYYGSYLYSETWNTGI....", } =cut __DATA__ >MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID DIDGDGQVNYEEFVQMMTAK* >gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY
      ok that work although I'm still trying to complete my program. Let me tell you the main thing I need to input the clone ID: and then the program should give me the sequence as the awnser. I thought about making the fasta file into a hash and then using the ID as the key to return the sequence as the awnser. Thank you and this a very easy way to get although this is a very cool way of doing something similar.