in reply to splitting data advice requested

How about this?
#!/usr/bin/perl use strict; use Data::Dumper; local $/ = ">"; my %hash; while (<DATA>) { s/>//; s/(^SEQ\d)//; $hash{">". $1} = $_ if ( defined( $1 )); } print Dumper(\%hash); __DATA__ >SEQ1 -----I--RL--AAIDVDG-NLT----------D--R--D-RL-ISTKA-IESIRS--A- -E-K--------K-GLT-VSL----LS------GN-V----I-PVV---YA-L------K IF---------------L-----GINGPVF------------------------------ >SEQ2 -MKI----KA--ISIDIDG-TIT------YPN-R-------MIHEK--A-LEAIRR--A- -E-S--------L-GIP-IML----VT------GN-T----V-QFA---EA-A------S IL---------------I-----G----TS-----------------GP-VV-------- >SEQ3 --KI----KA--ISIDIDG-TIT------YPN-R-------MIHEK--A-LEAIRR--A- -E-S--------L-GIP-IML----VT------GN-T----V-QFA---EA-A------S IL---------------I-----G----TS-----------------GP-VV-------- ---AE--D------GG---A---------------------------------------I

Replies are listed 'Best First'.
Re^2: splitting data advice requested
by oxone (Friar) on May 13, 2009 at 22:19 UTC
    I like your approach, because it avoids the problems of the "full file slurping" options suggested above (ie. won't scale for very large input files), and it recognises that '>' is a handy delimiter here. A small suggested improvement so that it removes the line breaks as per the OP, and is a bit shorter:
    ... while (<DATA>) { $hash{">$1"} = $2 if s/[>\n]//g && /^(SEQ\d+)(.*)/; } ...