Question on parsing a text file.

yoda54 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks,

I have a data file in which some of the time the information is split into multiple lines like this:

somedata data blah \
continuation of previous line \
another continuation of previous line \
stand alone line
stand alone line
multiple line \
continuation of multiple line
[download]

My question is how can I either 1) combine the multiple lines in advance or 2)join on the fly while reading the files in line by line?

while(<F>) {
    if (multiple line) {
      #somehow look ahead and collapese
    } else {
      #its a single line so just process
    }
}
[download]

Thanks for any suggestions.

Comment on Question on parsing a text file. Select or Download Code

Replies are listed 'Best First'.
Re: Question on parsing a text file. by jwkrahn (Abbot) on Jan 07, 2010 at 18:06 UTC
join on the fly while reading the files in line by line? This is very easy to do using redo: `$ echo 'somedata data blah \ continuation of previous line \ another continuation of previous line \ stand alone line stand alone line multiple line \ continuation of multiple line' \| perl -e' while ( <> ) { if ( s/\\\n// ) { $_ .= <>; redo; } print; } ' somedata data blah continuation of previous line another continuation +of previous line stand alone line stand alone line multiple line continuation of multiple line` [download]	[reply] [d/l]
Re: Question on parsing a text file. by kennethk (Abbot) on Jan 07, 2010 at 17:54 UTC
If you want to 1)combine the multiple lines in advance, you could slurp the file with $/, use a regular expression to strip "\\\n" and then split on the remaining new lines. If you want to 2)join on the fly while reading the files in line by line, you could use add buffer to your while loop and use next to do flow control if the line ends with backslash, like: `my $buffer = q{}; while(<F>) { $buffer .= $_; if (multiple line) { next; } else { #process; $buffer = q{}; } }` [download]	[reply] [d/l]
Re: Question on parsing a text file. by BioLion (Curate) on Jan 07, 2010 at 17:55 UTC
A simple approach would be to do it on the fly, concatenating lines if possible, otherwise proceeding with working on them. The approach I use most often is a tip i found from one of merlyn's posts - use a buffer: `use strict; use warnings; use autodie qw/open close/; ## buffer for holding current text; my $buffer = ''; open (my $fh, '/foo/bar.txt'); while(<$fh>) { ## clear trailing whitespace chomp; if (/\/$/) { ## use regex to see if line ends with a slash ## append to buffer $buffer .= $_; } else { ## it's a single line so just process process($buffer); ## sub elsewhere... ## reset $buffer = ''; } } close $fh;` [download] NB: Not tested! Just example... Update - This is merlyn's node - Re: Reading multiple lines?. Maybe not quite the same thing, but it inspired this response, or at least was the first place i came across this kind of control structure... He also remembered to check for the 'end of file'... Just a something something...	[reply] [d/l]