DanielM0412 has asked for the wisdom of the Perl Monks concerning the following question:

first off i know there is a package for the job i want to do in bio perl but i dont know where to look for it, i googled it but yielded no results soo i want to make a hash of unique keys, using this input

>sequence10 CGCTCCCACCCGCCCCTTCCCCAGCCTGCGGCTTTC >sequence11 AAACCGAACCTTCCGGGAATCGGAAGGCGCCGGGC >sequence12 CAGGGCCAAGGGGTGGGCAGCATGGAGGTGCAGGG

( this is just a couple of lines ) and i want the hash to look like this

my %unique_hash = ( >sequence10 => CGCTCCCACCCGCCCCTTCCCCAGCCTGCGGCTTTC,, >sequence11 => AAACCGAACCTTCCGGGAATCGGAAGGCGCCGGGCA, >sequence12 => CAGGGCCAAGGGGTGGGCAGCATGGAGGTGCAGGG );

thanks -daniel

Replies are listed 'Best First'.
Re: Bio perl package
by Marshall (Canon) on Aug 09, 2011 at 02:47 UTC
    This looks like a simple version of the FASTA format. I think that there has gotta be some bio library that can parse all possibilities of FASTA formats. I would recommend using that over my code because it will cover at least one more formatting case.

    But, for your enjoyment: I posted one simple parser of the FASTA format at Re: New to Perl. Have fun!

Re: Bio perl package
by toolic (Bishop) on Aug 08, 2011 at 18:04 UTC
    • Loop through your input file lines one at a time. See perlintro -> Files and I/O
    • Use a regular expression to find the sequence lines: perlre, and store the seq in a scalar variable.
    • Assign the other lines as hash values: perldata, perldsc
    Write code, and if you still have detailed questions, post them here.