mabeuf has asked for the wisdom of the Perl Monks concerning the following question:

Monks, Ive been writing a script for working with DNA (bioinformatics again!) but ive had trouble trying to put the header and the sequence into a hash. both $n and $m are defined correctly but only the $key will come out defined (even if I swap $m and $n over). Im sure its something simple but I cant work it out.

Example input:

@lines: >seq1 ASDFGHASDFGHJ ERTYUIOOIUYLK NBGFEWERTY >seq2 BGTNHYMJUKOPK MNBFSDFGHJ ....
my $line; my ($m, $n); my %all_hash = (); foreach $line (@lines){ chomp $line; if($line =~ /^>/){ $n = "\n$line\n"; }else{ $m = $line; } $all_hash{ $n } = $m; } print %all_hash;
any thoughts would be appreciated!

Replies are listed 'Best First'.
Re: inputting into hash error
by toolic (Bishop) on Apr 30, 2011 at 14:06 UTC
    Since you did not show the output you expect, I wonder if you really want HASHES OF ARRAYS:
    use warnings; use strict; use Data::Dumper; my @lines = <DATA>; my $n; my %all_hash; for (@lines) { chomp; if (/^>/) { $n = $_; } else { push @{ $all_hash{ $n } }, $_; } } print Dumper(\%all_hash); __DATA__ >seq1 ASDFGHASDFGHJ ERTYUIOOIUYLK NBGFEWERTY >seq2 BGTNHYMJUKOPK MNBFSDFGHJ
    Prints:
    $VAR1 = { '>seq1' => [ 'ASDFGHASDFGHJ', 'ERTYUIOOIUYLK', 'NBGFEWERTY' ], '>seq2' => [ 'BGTNHYMJUKOPK', 'MNBFSDFGHJ' ] };
Re: inputting into hash error
by ww (Archbishop) on Apr 30, 2011 at 14:23 UTC
    • Because your $all_hash{ $n } = $m; is inside the for loop, you clobber the prior instance each time you reach a new one.
    • Your variable names make your life more difficult
    • As written, the value of your $n includes the &lt; (<) sign. Do you really want that?

    The following may be what you're seeking:

    #!/usr/bin/perl use warnings; use strict; use 5.012; # 902200 my @lines = qw( >seq1 ASDFGHASDFGHJ ERTYUIOOIUYLK NBGFEWERTY >seq2 BGTNHYMJUKOPK MNBFSDFGHJ >seq3 USE_STRICT&USE_WARNINGS lastline ); my $line; my ($DNA, $seq); # descriptive var names my %all_hash = (); for my $line (@lines){ # insert after this line, see update 2 chomp $line; if ($line =~ /^>(seq\d)/ ) { # captures seq# withOUT the '>' $seq = "$1\n"; say "\n\$seq: $seq"; # useful for debug; otherwise, not } else { $DNA = "$line\n"; say "\$DNA: $DNA"; } no warnings; #otherwise, warns 'unitialized' for the first +$DNA $all_hash{ $seq } .= $DNA; # concat $DNAs, which now # have '\n's restored for readab +ility use warnings; } print "\n =============== \n"; print %all_hash;

    Output (when fixed as per update 2):

    $seq: seq1 $DNA: ASDFGHASDFGHJ $DNA: ERTYUIOOIUYLK $DNA: NBGFEWERTY $seq: seq2 $DNA: BGTNHYMJUKOPK $DNA: MNBFSDFGHJ $seq: seq3 $DNA: USE_STRICT&USE_WARNINGS $DNA: lastline =============== seq1 ASDFGHASDFGHJ ERTYUIOOIUYLK NBGFEWERTY seq3 USE_STRICT&USE_WARNINGS lastline seq2 BGTNHYMJUKOPK MNBFSDFGHJ

    Update

    Added comment re say at line 29

    Update2 : Missed a serious mistake here: the concat at line 36 actually inserts the last $DNA from the previous $seq into each new $seq. Bad on me! A fix is to reset $DNA to an empty string by inserting $DNA = ''; as a new line 26.