Re^2: Concatenating text for a hash problem

We came up with nearly the exact same solution. I believe you can eliminate $val, however, by operating directly on the hash value.

use strict;
use warnings;

my %hash;
my $id;

while( my $line = <DATA> )
{
    chomp $line;

    # if( $line =~ m/^>(.+?) / )  # changed to \S
    if( $line =~ m/^>(\S+)/ )
    {
        $id = $1;
    }
    else
    {
        $hash{$id} .= $line;
    }
}
[download]

ewijaya, if you are trying to read sequence files in fasta format, you might want to look at bioperl's Bio::SeqIO class.

Update: ihb made a good point about the regex. I made the assumption (based on the example data) that a space will always follow the sequence ID, but that may not always be true. Therefore, ihb's regex is a bit safer for that reason (although you should keep in mind that \w will not match '.' or '-', so \S is probably better).

Comment on Re^2: Concatenating text for a hash problem Download Code

Replies are listed 'Best First'.
Re^3: Concatenating text for a hash problem by ihb (Deacon) on Oct 15, 2004 at 06:38 UTC
Your regex change may be a bit too clever. We don't know for sure that the space always will be there. Looking at the OP, it's fair to assume that there always will be something though. Perhaps the next line is just ">EP40007". A better way to express what you want to express while not being as restrictive is `/^>(\S)+/` which does what you want: gets the first non-space characters. Personally I'd probably do the check in two steps; one to see if there's a '>' there (assuming that `/^>/` means a header line), and the next to see if the rest of the line holds a valid format. I habitually verify foreign input. `ihb` Read argumentation in its context!	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: Concatenating text for a hash problem
by ihb (Deacon) on Oct 15, 2004 at 06:38 UTC

Your regex change may be a bit too clever. We don't know for sure that the space always will be there. Looking at the OP, it's fair to assume that there always will be something though. Perhaps the next line is just ">EP40007". A better way to express what you want to express while not being as restrictive is /^>(\S)+/ which does what you want: gets the first non-space characters. Personally I'd probably do the check in two steps; one to see if there's a '>' there (assuming that /^>/ means a header line), and the next to see if the rest of the line holds a valid format. I habitually verify foreign input.

ihb

Read argumentation in its context!

[reply]
[d/l]
[select]