notorious has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am trying to run an expression to match a particular line of text in a file, then capture the entire rest of the file using a perl one-liner:

For example, in the following file, I want to match starting at Header2 (ie the first header after ".X-Intermail-Unknown-MIME-Type=unparsedmessage" to the end of the file.

<lsmothers@example.com> SMTP 0<001501c4db9b$db8b2680$2d01a8c0@ryand9v889t9uc> .X-Intermail-Unknown-MIME-Type=unparsedmessage Header2: <headertwo@example.com Received: from server.cluster1.example.com ([10.20.201.160]) line 12

EVERYTHING I have done around this does not catch it. An example of what I have tried is below. From what I read, I think the below method should work:

$ cat sample.txt | perl -wnl -e '/\.X\-Intermail\-Unknown\-MIME\-Type\ +=unparsedmessag(e.*)/s and print $1;' e

The above returns only the e character, and I did that on purpose to make sure the string is matched. I am using the s modifier after the expression, which would categorize "newline" as "any".

Can anyone give me any suggestions? Thanks, Robert

Replies are listed 'Best First'.
Re: regular expression - grabbing everything problem
by davido (Cardinal) on Aug 09, 2011 at 00:16 UTC

    If you wish to process the file line by line, you can use the flip-flop operator. This would make it unnecessary to use explicit control flags. Here's an example:

    while( <DATA> ) { print if /^Header2:/ .. eof; } __DATA__ <lsmothers@example.com> SMTP 0<001501c4db9b$db8b2680$2d01a8c0@ryand9v889t9uc> .X-Intermail-Unknown-MIME-Type=unparsedmessage Header2: <headertwo@example.com Received: from server.cluster1.example.com ([10.20.201.160]) line 12

    Updated as suggested in a followup to this post, by using eof as the RHS of the flip-flop. Nice if the script is altered to read from <>, as per the suggestion in a followup to this post.

    It seems strange to use Flip flop if you're only concerned with the initial flip. But it does work nicely. And if you're processing more than one file it can be used to catch the end of file to reset the search for the next file. The flip flop operator is discussed in the "Range Operators" section of perlop, as it's the same '..' operator.

    If you prefer to slurp the file into a string and process accordingly, you can do it like this:

    my $input = do { local $/ = undef; <DATA> }; if ( $input =~ /^(Header2:.+)/ms ) { print $1; }

    Or even...

    my $input; { local $/ = undef; $input = <DATA>; } print join '', ( split /^(Header2:)/m, $input, 3 )[ 1, 2 ];

    The split method could be altered to avoid capturing by using a lookahead assertion as the split point, like this:

    print join '', ( split /^(?=Header2:)/m, $input, 2)[1];

    This method creates only two elements; the one we don't want, and the one we do. The other split method created three elements; the one we don't want, the trigger text, and the rest of what we want to keep, so for that we have to specify that we want both elements 1 and 2.

    One liner versions of each of the above:

    perl -ne 'print if /^Header2:/ .. eof' testdata.txt perl -0777 -ne '/^(Header2:.+)/ms and print $1' testdata.txt perl -0777 -pe '$_=join q//,(split /^(Header2:)/m,$_,3)[1,2]' testdata +.txt perl -0777 -pe '$_=join q//,(split /^(?=Header2:)/m,$_,2)[1]' testdata +.txt

    Dave

      For your first example, 1 is always true, which will work fine if there is only one file in @ARGV, however you should probably use eof instead.

        That's a great suggestion. I intended for it being always true, which seemed fine for simplicity's sake. But if someone is using @ARGV or the empty diamond operator, eof is the answer. Excellent. Updating now. Thanks!


        Dave

Re: regular expression - grabbing everything problem
by jwkrahn (Abbot) on Aug 09, 2011 at 00:48 UTC
    $ cat sample.txt | perl -wnl -e '/\.X\-Intermail\-Unknown\-MIME\-Type\ +=unparsedmessag(e.*)/s and print $1;'

    You need to slurp the entire file and you don't need to use cat:

    $ perl -0777nle '/\.X-Intermail-Unknown-MIME-Type=unparsedmessage(.*)/ +s and print $1;' sample.txt
Re: regular expression - grabbing everything problem
by shmem (Chancellor) on Aug 09, 2011 at 03:13 UTC
    EVERYTHING I have done around this does not catch it.

    Of course not, since with

    perl -wnl -e

    you are reading the file line by line. Try

    perl -w -0777 -e

    See also perlrun.

Re: regular expression - grabbing everything problem
by JavaFan (Canon) on Aug 09, 2011 at 08:36 UTC
    Untested:
    perl -ne 'print if $x; $x = $_ eq ".X-Intermail-Unknown-MIME-Type=unpa +rsedmessage\n"' sample.txt
      That is really useful. Thank you all for your solutions!
A reply falls below the community's threshold of quality. You may see it by logging in.