Your records always start with a '>'. But you don't have to look at that character as the start of a record, you can look at it as a record separator. Then if you set $/ = '>';, your file reads will not read lines, but records. That (in my mind) simplifies the process. Here's one way to do it:
use strict; use warnings; use Data::Dumper; local $/ = '>'; my %hash; while( <DATA> ) { if( my($k,$v) = / ^(\N+)\n # Capture the key. ([CTGA]+)$ # Capture the value. /mx ) { $hash{$k}=$v; } } print Dumper \%hash; __DATA__ >sequence_5849 CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAAT +AATTCTGAGG >sequence_5959 CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTT +GTCAGCAGACACGC >sequence_0808 CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTG +CGGGCAGTAGGTGGAAT
Update: My solution assumes that there aren't newlines embedded in the CTG.... sequences. If there are, your regex should probably accept them, and then use tr/// to filter them out again. Also, since you're modifying the input record separator, it's wise to constrain its effects to as narrow a scope as possible. The following does that:
print Dumper sub { local $/ = '>'; my( $fh, %recs ) = shift; while( <$fh> ) { if( my($k,$v) = / ^(\N+)\n # Capture the key. ([CTGA]+)$ # Capture the value. /mx ) { $recs{$k}=$v; } } return \%recs; }->(\*DATA);
Dave
In reply to Re: Reading file into a hash
by davido
in thread Reading file into a hash
by PerlSufi
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |