Re: Reading file into a hash

Your records always start with a '>'. But you don't have to look at that character as the start of a record, you can look at it as a record separator. Then if you set $/ = '>';, your file reads will not read lines, but records. That (in my mind) simplifies the process. Here's one way to do it:

use strict;
use warnings;
use Data::Dumper;

local $/ = '>';
my %hash;

while( <DATA> ) {
  if( my($k,$v)
    = /
        ^(\N+)\n    # Capture the key.
        ([CTGA]+)$     # Capture the value.
      /mx
  ) {
    $hash{$k}=$v;
  }
}

print Dumper \%hash;

__DATA__
>sequence_5849
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAAT
+AATTCTGAGG
>sequence_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTT
+GTCAGCAGACACGC
>sequence_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTG
+CGGGCAGTAGGTGGAAT
[download]

Update: My solution assumes that there aren't newlines embedded in the CTG.... sequences. If there are, your regex should probably accept them, and then use tr/// to filter them out again. Also, since you're modifying the input record separator, it's wise to constrain its effects to as narrow a scope as possible. The following does that:

print Dumper sub {
  local $/ = '>';
  my( $fh, %recs ) = shift;
  while( <$fh> ) {
    if( my($k,$v) = /
        ^(\N+)\n    # Capture the key.
        ([CTGA]+)$     # Capture the value.
      /mx
    ) {
      $recs{$k}=$v;
    }
  }
  return \%recs;  
}->(\*DATA);
[download]

Dave

Comment on Re: Reading file into a hash Select or Download Code

Replies are listed 'Best First'.
Re^2: Reading file into a hash by PerlSufi (Friar) on May 29, 2014 at 19:45 UTC
Very Nice, davido. I have not used $/ much so I will keep that in mind for future reference :)	[reply]