in reply to Re: Saving different values for the same key by using Hash of Arrays
in thread Saving different values for the same key by using Hash of Arrays

Thanks a lot for your quick and detailed answer..but the script doesn't actually print me anything! How is this possible?

  • Comment on Re^2: Saving different values for the same key by using Hash of Arrays

Replies are listed 'Best First'.
Re^3: Saving different values for the same key by using Hash of Arrays
by Kenosis (Priest) on May 07, 2012 at 18:23 UTC

    It's possible because my regex didn't work with your reformatted FASTA records. :) aaron_baugher's suggestion to repost your records using <code> or <pre> was spot on, and helped with crafting the following new-and-improved solution--after your re-posting:

    use strict; use warnings; my %FASTAhash; { local $/ = '>'; open my $file, '<FASTA.txt' or die $!; while (<$file>) { next if !/(.*?)\n/; chomp( $FASTAhash{$1} = $' ) if !$FASTAhash{$1} or length $' > length $FASTAhash{$1}; } } print ">$_\n$FASTAhash{$_}" for keys %FASTAhash;

    Within a block, we start by letting perl know that '>' is the new record separator, instead of the default "\n" (so we read the file a FASTA record at a time, instead of a line at a time), and then tweaked the regex a bit to grab the ID.

    You'll note that we don't use close $file; when we're done, since the file's automatically close when my $file falls out of scope (when the block ends).

    Here's the output:

    >ENSG00000147724 MSEIQGTVEFSVELHKFYNVDLFQRGYYQIRVTLKVSSRIPHRLSASIAGQTESSSLHSA CVHDSTVHSRVFQILYRNEEVPINDAVVFRVHLLLGGERMEDALSEVDFQLKVDLHFTDS EQQLRDVAGAPMVSSRTLGLHFHPRNGLHHQVP >ENSG00000067082 Sequence unavailable >ENSG00000010072 MDDDLMLALRLQEEWNLQEAERDHAQESLSLVDASWELVDPTPDLQALFVQFNDQFFWGQ LEAVEVKWSVRMTLCAGICSYEGKGGMCSIRLSEPLLKLRPRKDLVETLLHEMIHAYLFV TNNDKDREGHGPEFCKHMHRINSLTGANITVYHTFHDEVDEYRRHWWRCNGPCQHRPPYY GYVKRATNREPSAHDYWWAEHQKTCGGTYIKIKEPENYSKKGKGKAKLGKEPVLAAENKD KPNRGEAQLVIPFSGKGYVLGETSNLPSPGKLITSHAINKTQDLLNQNHSANAVRPNSKI KVKFEQNGSSKNSHLVSPAVSNSHQNVLSNYFPRVSFANQKAFRGVNGSPRISVTVGNIP KNSVSSSSQRRVSSSKISLRNSSKVTESASVMPSQDVSGSEDTFPNKRPRLEDKTVFDNF FIKKEQIKSSGNDPKYSTTTAQNSSSSSSQSKMVNCPVCQNEVLESQINEHLDWCLEGDS IKVKSEESL*

    Hope this version's helpful!

    Update: After posting the above, just noticed aaron_baugher's solution using $/ = '>' and I think this makes good sense, since this is the FASTA record delimiter.

      Both versions of the code work perfectly! Thank you a lot guys, your help has been invaluable!!!

      I hope that time will make me more confident with Perl so that one day I can too be useful to someone in need..

        Thank you for your message! Am glad our code worked for you...

        The best to you.