New to Perl

rolandomantilla has asked for the wisdom of the Perl Monks concerning the following question:

I have a FASTA array and I need to make it into a hash in order to look for an specific ID and get that specific sequence, any ideas this is what I have until now and i know im not good with Perl by trying hard

#!/usr/local/bin/perl
#the file name
$filename= hwk2.seq;

use strict;
use warnings;
use lib '/class/bi617a/share/textbook';
use BeginPerlBioinfo;

#Declare initialize the variables
my @file_data=();
my $sequence= '';
my $query= '';
my %hash= '';

#Read the contents of the file hwk3.seq
@file_data=get_file_data("hwk2.seq");

#Extract the sequence data from the content;
%hash= map {$_ => 1 } @file_data;

if (exists $hash{$query}){
print "This is sequence: $sequence";
}else { print "Not and ID"};
exit;

sub get_file_data{
my($filename) = @_;
use strict;
use warnings;

unless ( open (GET_FILE_DATA, $filename) ) {
print STDERR "Cannot open file \"$filename\"\n\n";
exit;
}
@filedata= <GET_FILE_DATA>;
close GET_FILE_DATA;
return @filedata;
}
[download]

Comment on New to Perl Download Code

Replies are listed 'Best First'.
Re: New to Perl by Utilitarian (Vicar) on Aug 04, 2011 at 04:38 UTC
~~Two~~ One things catches my eye straight away.... Where do you assign a value to $query? ~~To define %hash you need to `map {$hash{$_}=1}@file_data`~~ One final issue you should use strict and use warnings to catch typos and to ensure you have scoped your variables Update corrected my half asleep error, too little testing of code and coffee `print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."`	[reply] [d/l] [select]
Re^2: New to Perl by chromatic (Archbishop) on Aug 04, 2011 at 05:32 UTC
To define %hash you need to `map {$hash{$_}=1}@file_data` Did the querent update the OP? This works just fine (and is less abusive): `my %hash = map { $_ => 1 } @file_data;`	[reply] [d/l] [select]
Re^3: New to Perl by rolandomantilla (Novice) on Aug 04, 2011 at 20:21 UTC
The $query is suppose to be the key for the hash which will be the the ID from the FASTA sequence	[reply]
Re^4: New to Perl by chromatic (Archbishop) on Aug 04, 2011 at 22:19 UTC
Re^5: New to Perl by rolandomantilla (Novice) on Aug 06, 2011 at 04:00 UTC
Some notes below your chosen depth have not been shown here
Re: New to Perl by Marshall (Canon) on Aug 05, 2011 at 15:08 UTC
There must be some sub contained with all the bio Perl modules that does this, but the format is simple and here is a parser for you. I didn't parse the older format that starts with ;. But you can add that if needed. See Wiki FASTA format. Blank lines between sequences are optional as well as ending a sequence with a ''; #!/usr/bin/perl -w use strict; use Data::Dump qw(pp); my %sequences; my $line; my $skip_read=0; while ($skip_read or defined ($line = <DATA>) ) { chomp $line; my ($id) = $line =~ /^\>(\w+)/; ($skip_read, $line) = finish_record($id,\%sequences) if ( defined +$id); } sub finish_record { my ($id, $seqHashRef) = @_; my $line; while (defined ($line = <DATA>) and $line !~ m/^\s$/ and $line !~ m/^\>/) { chomp $line; if ($line =~ /\$/) { $line =~ s/\$//; $seqHashRef->{$id}.= $line; return 0; } $seqHashRef->{$id}.= $line; } print "$line\n" if defined $line; return (1, $line) if (defined ($line) and $line =~ m/^\>/); return 0; } print pp(\%sequences); =prints { MCHU => "ADQLTEEQIAEFKEAFSLFDKDGDGT....", gi => "LCLYTHIGRNIYYGSYLYSETWNTGI....", } =cut __DATA__ >MCHU - Calmodulin - Human, rabbit, bovine, rat, and chicken ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQDMINEVDADGNGTID DIDGDGQVNYEEFVQMMTAK* >gi\|5524211\|gb\|AAD44166.1\| cytochrome b [Elephas maximus maximus] LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY [download]	[reply] [d/l]
Re^2: New to Perl by rolandomantilla (Novice) on Aug 06, 2011 at 03:42 UTC
ok that work although I'm still trying to complete my program. Let me tell you the main thing I need to input the clone ID: and then the program should give me the sequence as the awnser. I thought about making the fasta file into a hash and then using the ID as the key to return the sequence as the awnser. Thank you and this a very easy way to get although this is a very cool way of doing something similar.	[reply]