multi-line match

pdotcdot has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: multi-line match by broquaint (Abbot) on Aug 05, 2003 at 15:45 UTC
If your data is delimited by a blank line then you can change the `$/` (input record seperator) variable to read in chunks of the file up to every double newline. Something like this should get you started on parsing your file `{ local $/ = "\n\n"; while(<INPUT1>) { my($head,$data) = m< ^ \* ([^\n]+) \n (.?) $/ >xm; push @seqs => [ split /\n/, $data ]; push @headers => $head; print OUTFILE "$head\n"; print OUTFILE2 @{ $seqs[-1] }, "\n"; } }` [download] The above code should save the data into `@seqs` which will be an array of arrasy, and the headers into `@headers` as a simple array, all the while printing the data into `OUTFILE2` and the headers into `OUTFILE`. A couple of errors in your code were that you were `exit`ting in the case of the first condition (maybe you meant `next`?) and you forgot to escape `` in the second condition, which should've triggered a compile-time error. HTH `_________ broquaint`	[reply] [d/l]
Re: multi-line match by BrowserUk (Patriarch) on Aug 05, 2003 at 16:14 UTC
If there is any chance of there being more than one blank line between your records, you should set `$/='';` to enable "paragraph mode" in preference to "\n\n". To quote from perlvar:$INPUT_RECORD_SEPERATOR: Setting it to "\n\n" means something slightly different than setting to "", if the file contains consecutive empty lines. Setting to "" will treat two or more consecutive empty lines as a single empty line. Setting to "\n\n" will blindly assume that the next input character belongs to the next paragraph, even if it's a newline. If your files are not too big, you could also set `local $/=undef;` (or simply, `local $/;`) to read the whole file into a scalar and then use m//g in a while loop to process the records. `$s = "abc\ncd\ne\n\npqr\nst\nf\n\n"; while( $s =~ m[ (\w+) \n (\w+) \n (\w+) ]gx ) { print "$1:$2:$3"; } abc:cd:e pqr:st:f` [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller If I understand your problem, I can solve it! Of course, the same can be said for you.	[reply] [d/l] [select]
Re: multi-line match by JamesNC (Chaplain) on Aug 05, 2003 at 17:02 UTC
It looks like you want the stuff between the headers as message, so this simple deal will build an array with indexes based on headers, if you want to preserve the newlines, just remove the chomp, otherwise process the array as you like: `use strict; my ($line, $i); my @messages; while(<DATA>){ my $hdr = 0; if(/header/){ $hdr = 1; $i++; } unless($hdr){ chomp; $messages[$i-1] .= "$_ "; } } print "$_ \n" for @messages; __DATA__ header This is messge one line 1. This is message one line 2. header Yet some more example text. Blah, blah.. Line 2 more blah, blah..` [download] JamesNC	[reply] [d/l]
Re: multi-line match by pdotcdot (Acolyte) on Aug 06, 2003 at 14:22 UTC
Thanks guys, i'll try out these different options! The errors in the code were cut and paste errors/debugging stuff i was trying out. cheers PC	[reply]
Re: multi-line match by pdotcdot (Acolyte) on Aug 07, 2003 at 10:01 UTC
hi, i have recieved a new file which contains no new lines between entries eg `header asdasdddas asdsadds header sdsdasd asdsdds` [download] I have tried modding the suggestions above but they all come back with garbage, and i have super searched as well to no avail. sorry to ask for help gain so soon, but time is pressing! Thanks in advance PC	[reply] [d/l]
Re: Re: multi-line match by BrowserUk (Patriarch) on Aug 07, 2003 at 11:23 UTC
The easiest way would be to use two passes. The first adds a blank line before each header: `perl -ple"$_ = qq[\n] . $_ if /^header/" infile >modified` [download] NB! Different quotes on nix! The second pass is just which ever of the earlier answers you like best. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller If I understand your problem, I can solve it! Of course, the same can be said for you.	[reply] [d/l]
Re: Re: Re: multi-line match by pdotcdot (Acolyte) on Aug 08, 2003 at 09:33 UTC
Thanks very much BrowserUk and all monks, i'm still sorting out different multiline queries but i am determined to get there! thanks again!	[reply]