Bio perl package

DanielM0412 has asked for the wisdom of the Perl Monks concerning the following question:

first off i know there is a package for the job i want to do in bio perl but i dont know where to look for it, i googled it but yielded no results soo i want to make a hash of unique keys, using this input

>sequence10
 CGCTCCCACCCGCCCCTTCCCCAGCCTGCGGCTTTC
>sequence11
AAACCGAACCTTCCGGGAATCGGAAGGCGCCGGGC
>sequence12
CAGGGCCAAGGGGTGGGCAGCATGGAGGTGCAGGG
[download]

( this is just a couple of lines ) and i want the hash to look like this

 
my %unique_hash = (
   >sequence10 => CGCTCCCACCCGCCCCTTCCCCAGCCTGCGGCTTTC,,
   >sequence11 => AAACCGAACCTTCCGGGAATCGGAAGGCGCCGGGCA,
   >sequence12 => CAGGGCCAAGGGGTGGGCAGCATGGAGGTGCAGGG
);
[download]

thanks -daniel

Comment on Bio perl package Select or Download Code

Replies are listed 'Best First'.
Re: Bio perl package by Marshall (Canon) on Aug 09, 2011 at 02:47 UTC
This looks like a simple version of the FASTA format. I think that there has gotta be some bio library that can parse all possibilities of FASTA formats. I would recommend using that over my code because it will cover at least one more formatting case. But, for your enjoyment: I posted one simple parser of the FASTA format at Re: New to Perl. Have fun!	[reply]
Re: Bio perl package by toolic (Bishop) on Aug 08, 2011 at 18:04 UTC
Loop through your input file lines one at a time. See perlintro -> Files and I/O Use a regular expression to find the sequence lines: perlre, and store the seq in a scalar variable. Assign the other lines as hash values: perldata, perldsc Write code, and if you still have detailed questions, post them here.	[reply]