in reply to parsing multiple lines

My preferred approach is basically the same as what has been said above, but I like to use a hash ref to keep track of the parsed information. Another way to look at this problem is that the file is a sequence of records, and after you parse a complete record you want to 'process' it in some fashion. The typical way to do this is:
my $r = {}; # hash to hold the parsed record while (<IN>) { if (/^(\d+).../) { # found beginning of new record if ($r->{id}) { process($r); } $r = {}; # begin new record $r->{id} = $1; # populate parsed info from this line } elsif (/KEGG.../) { $r->{kegg} = ...; } elsif ... } } if ($r->{id}) { process($r) }; sub process { my $r = shift; ... }
By changing the process subroutine you can re-use this code to perform different kinds of analyses on the file.

Replies are listed 'Best First'.
Re^2: parsing multiple lines
by GrandFather (Saint) on May 21, 2008 at 21:19 UTC

    Why a hash ref rather than a hash? Surely it is simpler and clearer to write:

    my %record; while (<IN>) { if (/^(\d+).../) { # found beginning of new record process (\%record) if $record{id}; %record = (); # Flush old record $record{id} = $1; # populate parsed info from this line ...

    Perl is environmentally friendly - it saves trees
Re^2: parsing multiple lines
by sm2004 (Acolyte) on May 22, 2008 at 00:53 UTC
    Thanks a lot. I'm learning from all alternative ideas.