If you wish to process the file line by line, you can use the flip-flop operator. This would make it unnecessary to use explicit control flags. Here's an example:

while( <DATA> ) { print if /^Header2:/ .. eof; } __DATA__ <lsmothers@example.com> SMTP 0<001501c4db9b$db8b2680$2d01a8c0@ryand9v889t9uc> .X-Intermail-Unknown-MIME-Type=unparsedmessage Header2: <headertwo@example.com Received: from server.cluster1.example.com ([10.20.201.160]) line 12

Updated as suggested in a followup to this post, by using eof as the RHS of the flip-flop. Nice if the script is altered to read from <>, as per the suggestion in a followup to this post.

It seems strange to use Flip flop if you're only concerned with the initial flip. But it does work nicely. And if you're processing more than one file it can be used to catch the end of file to reset the search for the next file. The flip flop operator is discussed in the "Range Operators" section of perlop, as it's the same '..' operator.

If you prefer to slurp the file into a string and process accordingly, you can do it like this:

my $input = do { local $/ = undef; <DATA> }; if ( $input =~ /^(Header2:.+)/ms ) { print $1; }

Or even...

my $input; { local $/ = undef; $input = <DATA>; } print join '', ( split /^(Header2:)/m, $input, 3 )[ 1, 2 ];

The split method could be altered to avoid capturing by using a lookahead assertion as the split point, like this:

print join '', ( split /^(?=Header2:)/m, $input, 2)[1];

This method creates only two elements; the one we don't want, and the one we do. The other split method created three elements; the one we don't want, the trigger text, and the rest of what we want to keep, so for that we have to specify that we want both elements 1 and 2.

One liner versions of each of the above:

perl -ne 'print if /^Header2:/ .. eof' testdata.txt perl -0777 -ne '/^(Header2:.+)/ms and print $1' testdata.txt perl -0777 -pe '$_=join q//,(split /^(Header2:)/m,$_,3)[1,2]' testdata +.txt perl -0777 -pe '$_=join q//,(split /^(?=Header2:)/m,$_,2)[1]' testdata +.txt

Dave


In reply to Re: regular expression - grabbing everything problem by davido
in thread regular expression - grabbing everything problem by notorious

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.