comment on

You've chosen an effective use for a hash, but if you only need to find the longest (or longer or only) sequence for an ID that has one or more sequences, consider the following solution that doesn't use an array:

use strict;
use warnings;

my %FASTAhash;

open my $file, '<FASTA.txt' or die $!;
while (<$file>) {
    next if !/(>[^ ]+) /;
    chomp( $FASTAhash{$1} = $' )
      if !$FASTAhash{$1}
          or length $' > length $FASTAhash{$1};
}
close $file;

print "$_ $FASTAhash{$_}\n" for keys %FASTAhash;
[download]

The regex matches the ID, which is placed into $1, leaving the remaining (unmatched) sequence in $'. The hash item whose key is the ID in $1 is assigned the sequence in $' and then chomped if that item's undefined (in this case) or the length of $' is greater than what's already there. When done, each ID is paired with its longest sequence. (Is it possible for two sequences of the same ID to be the same length? If so, do you need to code for that?)

Output from processing your data:

>ENSG00000147724 MSEIQGTVEFSVELHKFYNVDLFQRGYYQIRVTLKVSSRIPHRLSASIAGQTE
+SSSLHSA CVHDSTVHSRVFQILYRNEEVPINDAVVFRVHLLLGGERMEDALSEVDFQLKVDLHFTDS 
+EQQLRDVAGAPMVSSRTLGLHFHPRNGLHHQVP
>ENSG00000010072 MDDDLMLALRLQEEWNLQEAERDHAQESLSLVDASWELVDPTPDLQALFVQFN
+DQFFWGQ LEAVEVKWSVRMTLCAGICSYEGKGGMCSIRLSEPLLKLRPRKDLVETLLHEMIHAYLFV 
+TNNDKDREGHGPEFCKHMHRINSLTGANITVYHTFHDEVDEYRRHWWRCNGPCQHRPPYY GYVKRATN
+REPSAHDYWWAEHQKTCGGTYIKIKEPENYSKKGKGKAKLGKEPVLAAENKD KPNRGEAQLVIPFSGK
+GYVLGETSNLPSPGKLITSHAINKTQDLLNQNHSANAVRPNSKI KVKFEQNGSSKNSHLVSPAVSNSH
+QNVLSNYFPRVSFANQKAFRGVNGSPRISVTVGNIP KNSVSSSSQRRVSSSKISLRNSSKVTESASVM
+PSQDVSGSEDTFPNKRPRLEDKTVFDNF FIKKEQIKSSGNDPKYSTTTAQNSSSSSSQSKMVNCPVCQ
+NEVLESQINEHLDWCLEGDS IKVKSEESL*
>ENSG00000067082 Sequence unavailable
[download]

Hope this helps!

In reply to Re: Saving different values for the same key by using Hash of Arrays by Kenosis
in thread Saving different values for the same key by using Hash of Arrays by beginner27

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.