regular expression - grabbing everything problem

notorious has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am trying to run an expression to match a particular line of text in a file, then capture the entire rest of the file using a perl one-liner:

For example, in the following file, I want to match starting at Header2 (ie the first header after ".X-Intermail-Unknown-MIME-Type=unparsedmessage" to the end of the file.

<lsmothers@example.com>
SMTP
0<001501c4db9b$db8b2680$2d01a8c0@ryand9v889t9uc>
.X-Intermail-Unknown-MIME-Type=unparsedmessage
Header2: <headertwo@example.com
Received: from server.cluster1.example.com ([10.20.201.160])
        line 12
[download]

EVERYTHING I have done around this does not catch it. An example of what I have tried is below. From what I read, I think the below method should work:

$ cat sample.txt | perl -wnl -e '/\.X\-Intermail\-Unknown\-MIME\-Type\
+=unparsedmessag(e.*)/s and print $1;'
e
[download]

The above returns only the e character, and I did that on purpose to make sure the string is matched. I am using the s modifier after the expression, which would categorize "newline" as "any".

Can anyone give me any suggestions? Thanks, Robert

Comment on regular expression - grabbing everything problem Select or Download Code

Replies are listed 'Best First'.
Re: regular expression - grabbing everything problem by davido (Cardinal) on Aug 09, 2011 at 00:16 UTC
If you wish to process the file line by line, you can use the flip-flop operator. This would make it unnecessary to use explicit control flags. Here's an example: `while( <DATA> ) { print if /^Header2:/ .. eof; } __DATA__ <lsmothers@example.com> SMTP 0<001501c4db9b$db8b2680$2d01a8c0@ryand9v889t9uc> .X-Intermail-Unknown-MIME-Type=unparsedmessage Header2: <headertwo@example.com Received: from server.cluster1.example.com ([10.20.201.160]) line 12` [download] Updated as suggested in a followup to this post, by using eof as the RHS of the flip-flop. Nice if the script is altered to read from `<>`, as per the suggestion in a followup to this post. It seems strange to use Flip flop if you're only concerned with the initial flip. But it does work nicely. And if you're processing more than one file it can be used to catch the end of file to reset the search for the next file. The flip flop operator is discussed in the "Range Operators" section of perlop, as it's the same '`..`' operator. If you prefer to slurp the file into a string and process accordingly, you can do it like this: `my $input = do { local $/ = undef; <DATA> }; if ( $input =~ /^(Header2:.+)/ms ) { print $1; }` [download] Or even... `my $input; { local $/ = undef; $input = <DATA>; } print join '', ( split /^(Header2:)/m, $input, 3 )[ 1, 2 ];` [download] The split method could be altered to avoid capturing by using a lookahead assertion as the split point, like this: `print join '', ( split /^(?=Header2:)/m, $input, 2)[1];` [download] This method creates only two elements; the one we don't want, and the one we do. The other split method created three elements; the one we don't want, the trigger text, and the rest of what we want to keep, so for that we have to specify that we want both elements 1 and 2. One liner versions of each of the above: `perl -ne 'print if /^Header2:/ .. eof' testdata.txt perl -0777 -ne '/^(Header2:.+)/ms and print $1' testdata.txt perl -0777 -pe '$_=join q//,(split /^(Header2:)/m,$_,3)[1,2]' testdata +.txt perl -0777 -pe '$_=join q//,(split /^(?=Header2:)/m,$_,2)[1]' testdata +.txt` [download] Dave	[reply] [d/l] [select]
Re^2: regular expression - grabbing everything problem by jwkrahn (Abbot) on Aug 09, 2011 at 00:42 UTC
For your first example, 1 is always true, which will work fine if there is only one file in `@ARGV`, however you should probably use eof instead.	[reply] [d/l]
Re^3: regular expression - grabbing everything problem by davido (Cardinal) on Aug 09, 2011 at 00:53 UTC
That's a great suggestion. I intended for it being always true, which seemed fine for simplicity's sake. But if someone is using `@ARGV` or the empty diamond operator, eof is the answer. Excellent. Updating now. Thanks! Dave	[reply] [d/l]
Re: regular expression - grabbing everything problem by jwkrahn (Abbot) on Aug 09, 2011 at 00:48 UTC
`$ cat sample.txt \| perl -wnl -e '/\.X\-Intermail\-Unknown\-MIME\-Type\ +=unparsedmessag(e.)/s and print $1;'` [download] You need to slurp the entire file and you don't need to use cat: `$ perl -0777nle '/\.X-Intermail-Unknown-MIME-Type=unparsedmessage(.)/ +s and print $1;' sample.txt` [download]	[reply] [d/l] [select]
Re: regular expression - grabbing everything problem by shmem (Chancellor) on Aug 09, 2011 at 03:13 UTC
EVERYTHING I have done around this does not catch it. Of course not, since with `perl -wnl -e` [download] you are reading the file line by line. Try `perl -w -0777 -e` [download] See also perlrun.	[reply] [d/l] [select]
Re: regular expression - grabbing everything problem by JavaFan (Canon) on Aug 09, 2011 at 08:36 UTC
Untested: `perl -ne 'print if $x; $x = $_ eq ".X-Intermail-Unknown-MIME-Type=unpa +rsedmessage\n"' sample.txt` [download]	[reply] [d/l]
Re^2: regular expression - grabbing everything problem by notorious (Initiate) on Aug 09, 2011 at 21:58 UTC
That is really useful. Thank you all for your solutions!	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.