Re: parsing multiple lines

An alternative approach could be to see the parsing of your file as a state machine: initialize $status=0 and then, for each line from the file, check for the "type" of the line (title line,name, kegg, function evidence, process evidence, component evidence, other).

switch on the line type and do as follow:
title line: if $status>0 call the output function (see later); then extract locus tag and name to two vars, initialize $kegg, $function, $process, $component as "unknown", set $status=1.
kegg: strip the "KEGG pathway:" portion of the line and put the remainder in $kegg, set $status=2.
function evidence: set $function='' and $status=3.
process evidence: set $process='' and $status=4.
component evidence: set $component='' and $status=5.
other: depending on the value of $status (between 2 and 5) add the line to the corresponding var. If status<2 do nothing.

At end of file, if $status>0 call again the output function (this is needed to output the last block).

The output function should take the values stored in the 6 vars and print them to the output file

Rule One: Do not act incautiously when confronting a little bald wrinkly smiling man.

Comment on Re: parsing multiple lines

Replies are listed 'Best First'.
Re^2: parsing multiple lines by sm2004 (Acolyte) on May 22, 2008 at 00:51 UTC
I could definitely use this idea for some of my other scripts too. Thanks a lot.	[reply]