I need to regex multiple lines

legend has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to read a file and capture particular lines into different strings:

LENGTH: Some Content here

TEXT: Some Content Here

COMMENT: Some Content Here
[download]

I want to be able to get (LENGTH: .... ) into one array and so on... I'm trying to use PERL in slurp mode but for some reason I'm having trouble. I solved the problem using "read the file line by line" technique but I want to use the slurp mode to solve it because the input is something like this:

LENGTH: ......................................................
..................................................................
..................................................................

...................................................................
..................................................................

SUBJECT: .......................................................

COMMENT: .....................................................
....................................................................
[download]

As you can observe, the data that I want is not limited to one line but rather spans multiple lines so thought using a regex over multiple lines would be better. Do you have any suggestion on how to solve this problem?

Comment on I need to regex multiple lines Select or Download Code

Replies are listed 'Best First'.
Re: I need to regex multiple lines by hipowls (Curate) on Feb 26, 2008 at 07:48 UTC
There are two regex modifiers that help. /s so that . matches \n and /m so that ^ and $ match at the start and end of logical lines. /x allows free formating and embedded comments. Using these modifiers you regex can be written as `my @array = $line =~ m{ ^(LENGTH: # a line beginning with LENGTH: .?) # as little as possible until ^(SUBJECT: # a line beginning with SUBJECT: .?) # as little as possible until ^(COMMENT: # a line beggining with COMMENT .*) # and the rest }msx` [download] which will create an array with $VAR1 = [ 'LENGTH: ................................................... +... .................................................................. .................................................................. ................................................................... .................................................................. ', 'SUBJECT: .................................................. +..... ', 'COMMENT: .................................................. +... .................................................................... ' ]; [download]	[reply] [d/l] [select]
Re: I need to regex multiple lines by jwkrahn (Abbot) on Feb 26, 2008 at 07:39 UTC
If you want lines into strings: `my ( $key, %data ); while ( <FH> ) { if ( /^([^:]+):/ ) { $data{ $key = $1 } = $_; } else { $data{ $key } .= $_; } }` [download] If you want lines into arrays: `my ( $key, %data ); while ( <FH> ) { if ( /^([^:]+):/ ) { $data{ $key = $1 } = [ $_ ]; } else { push @{ $data{ $key } }, $_; } }` [download] If you just want an array instead of a hash: `my ( $key, @data ); while ( <FH> ) { if ( /^([^:]+):/ ) { push @data, [ $_ ]; } else { push @{ $data[ -1 ] }, $_; } }` [download] Update: and with a single array: `my ( $key, @data ); while ( <FH> ) { if ( /^([^:]+):/ ) { push @data, $_; } else { $data[ -1 ] .= $_; } }` [download]	[reply] [d/l] [select]
Re^2: I need to regex multiple lines by svenXY (Deacon) on Feb 26, 2008 at 07:54 UTC
Hi, jwkrahn++, but only if there are no further colons in the text... Regards, svenXY	[reply]
Re: I need to regex multiple lines by grizzley (Chaplain) on Feb 26, 2008 at 07:56 UTC
Assuming that input starts with some uppercase key and that your keys are uppercase strings I would suggest following: `my $key; while(<>) { if(s/^([A-Z]+)://) { $key = $1 } push @{$hash{$key}}, $_ } # use array consisting 'LENGTH' print @{$hash{'LENGTH'}}` [download]	[reply] [d/l]
Re: I need to regex multiple lines by Erez (Priest) on Feb 26, 2008 at 12:54 UTC
Structural Regular Expressions to the rescue!. Assign $/ to the delimiter of the values (the empty string - '' - might do here), which would gulp each part and not the whole file. Software speaks in tongues of man. Stop saying 'script'. Stop saying 'line-noise'. We have nothing to lose but our metaphores.	[reply]
The requested "slurp-solution" by rminner (Chaplain) on Feb 26, 2008 at 09:36 UTC
Comment: The end of string \Z in $capture is needed because of the lookahead(?=) in the regex. Without it it wouldn't match the last entry. use strict; use warnings; use File::Slurp; my $wholefile = read_file('data.txt'); my $capture = qr{(LENGTH\|SUBJECT\|COMMENT\|\Z)}; while ($wholefile =~ m{^$capture:(.*?)(?=$capture)}smgcx) { my ($type , $data) = ($1 , $2); print "Type: $type\n"; print "Data: $data\n"; }	[reply]
Re: I need to regex multiple lines by locked_user sundialsvc4 (Abbot) on Feb 26, 2008 at 16:16 UTC
Yet another approach to consider is one that might be used say with Perl's little brother, awk. This tool is based on the idea of “here's a bunch of regular-expressions and code-blocks. For each line, find the matching expression(s) and do what they say.” Importantly, there is also a BEGIN block that's executed before the first line, and an END block that's executed afterwards. (Yes, this is where Larry Wall got that idea...) So what you can do is to define a “state machine” of sorts. For instance, when you see a line that starts with 'LENGTH' you go into this mode; when you see 'SUBJECT' you go into that mode, and so-on. The “mode” value then tells you what to do with each line that does not match any of these; say, a line consisting of dots. What's “the right way” to do it? Of course there is none. But this approach is useful to put into your thinking-cap when you must deal with a more complicated issue such as parsing a printed-output file. Finally, for very complicated inputs, you can actually use a true parser.
Re: I need to regex multiple lines by legend (Sexton) on Feb 26, 2008 at 19:41 UTC
Wow... SO many solutions.. Thank you all so much. The $/ interests me a lot. I have read the article and decided upon a delimiter but I'm not really sure how to make it work. I'm trying: `open(IN, "filename.txt"); $/ = '/delimiter here/'; while(<>) { }` [download] But I'm confused, how do I read chunks of data and operate upon them? I mean, grab the chunk with the delimted text and then perform some regex matching on it..	[reply] [d/l]