in reply to Reading file into a hash

Your records always start with a '>'. But you don't have to look at that character as the start of a record, you can look at it as a record separator. Then if you set $/ = '>';, your file reads will not read lines, but records. That (in my mind) simplifies the process. Here's one way to do it:

use strict; use warnings; use Data::Dumper; local $/ = '>'; my %hash; while( <DATA> ) { if( my($k,$v) = / ^(\N+)\n # Capture the key. ([CTGA]+)$ # Capture the value. /mx ) { $hash{$k}=$v; } } print Dumper \%hash; __DATA__ >sequence_5849 CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAAT +AATTCTGAGG >sequence_5959 CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTT +GTCAGCAGACACGC >sequence_0808 CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTG +CGGGCAGTAGGTGGAAT

Update: My solution assumes that there aren't newlines embedded in the CTG.... sequences. If there are, your regex should probably accept them, and then use tr/// to filter them out again. Also, since you're modifying the input record separator, it's wise to constrain its effects to as narrow a scope as possible. The following does that:

print Dumper sub { local $/ = '>'; my( $fh, %recs ) = shift; while( <$fh> ) { if( my($k,$v) = / ^(\N+)\n # Capture the key. ([CTGA]+)$ # Capture the value. /mx ) { $recs{$k}=$v; } } return \%recs; }->(\*DATA);

Dave

Replies are listed 'Best First'.
Re^2: Reading file into a hash
by PerlSufi (Friar) on May 29, 2014 at 19:45 UTC
    Very Nice, davido. I have not used $/ much so I will keep that in mind for future reference :)