It's possible because my regex didn't work with your reformatted FASTA records. :) aaron_baugher's suggestion to repost your records using <code> or <pre> was spot on, and helped with crafting the following new-and-improved solution--after your re-posting:
use strict; use warnings; my %FASTAhash; { local $/ = '>'; open my $file, '<FASTA.txt' or die $!; while (<$file>) { next if !/(.*?)\n/; chomp( $FASTAhash{$1} = $' ) if !$FASTAhash{$1} or length $' > length $FASTAhash{$1}; } } print ">$_\n$FASTAhash{$_}" for keys %FASTAhash;
Within a block, we start by letting perl know that '>' is the new record separator, instead of the default "\n" (so we read the file a FASTA record at a time, instead of a line at a time), and then tweaked the regex a bit to grab the ID.
You'll note that we don't use close $file; when we're done, since the file's automatically close when my $file falls out of scope (when the block ends).
Here's the output:
>ENSG00000147724 MSEIQGTVEFSVELHKFYNVDLFQRGYYQIRVTLKVSSRIPHRLSASIAGQTESSSLHSA CVHDSTVHSRVFQILYRNEEVPINDAVVFRVHLLLGGERMEDALSEVDFQLKVDLHFTDS EQQLRDVAGAPMVSSRTLGLHFHPRNGLHHQVP >ENSG00000067082 Sequence unavailable >ENSG00000010072 MDDDLMLALRLQEEWNLQEAERDHAQESLSLVDASWELVDPTPDLQALFVQFNDQFFWGQ LEAVEVKWSVRMTLCAGICSYEGKGGMCSIRLSEPLLKLRPRKDLVETLLHEMIHAYLFV TNNDKDREGHGPEFCKHMHRINSLTGANITVYHTFHDEVDEYRRHWWRCNGPCQHRPPYY GYVKRATNREPSAHDYWWAEHQKTCGGTYIKIKEPENYSKKGKGKAKLGKEPVLAAENKD KPNRGEAQLVIPFSGKGYVLGETSNLPSPGKLITSHAINKTQDLLNQNHSANAVRPNSKI KVKFEQNGSSKNSHLVSPAVSNSHQNVLSNYFPRVSFANQKAFRGVNGSPRISVTVGNIP KNSVSSSSQRRVSSSKISLRNSSKVTESASVMPSQDVSGSEDTFPNKRPRLEDKTVFDNF FIKKEQIKSSGNDPKYSTTTAQNSSSSSSQSKMVNCPVCQNEVLESQINEHLDWCLEGDS IKVKSEESL*
Hope this version's helpful!
Update: After posting the above, just noticed aaron_baugher's solution using $/ = '>' and I think this makes good sense, since this is the FASTA record delimiter.
In reply to Re^3: Saving different values for the same key by using Hash of Arrays
by Kenosis
in thread Saving different values for the same key by using Hash of Arrays
by beginner27
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |