beeny has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I'm trying to create a regex that will select multiple lines of text into an array, the regex starts with a keyword and then the end marker is a blank line. So far I have... @data is the read in log file that I'm trying to parse and extract from. My problem is that the blocks of text vary in the number of lines so counting on from the matching line will not work, but counting back from the end marker of a blank line will.

if($data[$i] =~ /Message Type: /){ $j=$i+13; #This will not work $k=$i+14; #This will not work @details0=split / /, $data[$i]; @details1=split / /, $data[$j]; @details2=split / /, $data[$k]; $txname=$details0[2]; $txtime=$details2[2]; $errorcode=$details1[2]; print OUT "$txname\t\t\t$txtime\t\t$errorcode\n"; }
Many many thanks from a rapidly balding chap in advance, Beeny

Replies are listed 'Best First'.
Re: Regex to select multiple lines
by pernod (Chaplain) on Dec 09, 2004 at 16:13 UTC

    Ok, some heavy assumptions in the following code, but oh well. Here goes:

    #! /usr/bin/perl use strict; # Assume a blank line to mean no whitespace $/ = "\n\n"; foreach my $record ( <DATA> ) { chomp( $record ); my @data = split( "\n", $record ); # Take the last element of the last three lines in the record my $txname = ( split( / /, $data[ -3 ] ) )[-1]; my $txtime = ( split( / /, $data[ -2 ] ) )[-1]; my $errorcode = ( split( / /, $data[ -1 ] ) )[-1]; print "$txname\t\t\t$txtime\t\t$errorcode\n"; } __DATA__ Message Type: Name: Someone Time: Now Errorcode: 42 Message Type: Seismic instability Source: Japan Name: Anyone Time: Yesterday Errorcode: 23 Message Type: Author: Lewis Carrol Work: The hunting of whatever Name: Bellman Time: 12:32:1892 Errorcode: Snark!

    The main point here is the use of the Input Record Separator ($/) to split the file into records. Then you can split on newlines and fetch the last three rows. The code above contains quite a bit of room for improvement in the regex-department (grabbing results into $1 and friends, for example), but I'll not muddy my point about the $/.

    On my box, this (untidily) prints:

    Someone Now 42 Anyone Yesterday 23 Bellman 12:32:1892 Snark!

    Good luck!

    pernod
    --
    Mischief. Mayhem. Soap.

      Okie after reviewing the replies to my question, thanks to all of you. I came up with this.
      sub chewitup { for($i=0; $i<$array_size; $i++){ chomp($data[$i]); if($data[$i] =~ /Message Type:/){ @details0=split / /, $data[$i]; unless($data[$i] eq $blank){ if($data[$i] =~ /Error Code:/){ @details2=split / /, $data[$i]; } if($data[$i] =~ /Transaction Time:/){ @details1=split / /, $data[$i]; } next $i; } $txname=$details0[2]; $txtime=$details1[2]; $errorcode=$details2[2]; print OUT "$txname\t\t\t$txtime\t\t$errorcode\n"; } } }

      Opinions please...thanks
Re: Regex to select multiple lines
by EdwardG (Vicar) on Dec 09, 2004 at 16:05 UTC

    Have you considered dealing with your input text as a whole, rather than line by line? You could, for instance, split your input into "messages" with code like this:

    my @details = split /Message Type/, $data;

    Once you have these "messages" as array elements, you can then look at each one as a discrete item, rather than having to deal with the problem of where one message ends and another starts.

    If you know that the last few items are the ones you want, you can reference them in a number of ways, depending on your circumstance.

    One way is to simply use $#array, which gives you the subscript of the last element in the array, like this:

    my $last_item = $array[$#array];

    The item-before-last would then be:

    my $item_before_last = $array[ $#array - 1 ];

    Alternatively, if efficiency is not important, you could reverse the array like this:

    @array = reverse @array; # ...and the last shall be first :) my $last_item = $array[0];

    I hope this helps, but I suspect it won't be enough.

     

Re: Regex to select multiple lines
by davorg (Chancellor) on Dec 09, 2004 at 15:49 UTC

    I think we need to see more of your code and a sample of your input data.

    And please read the formatting guidelines. Your array indexes look very confusing.

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Regex to select multiple lines
by Animator (Hermit) on Dec 09, 2004 at 15:53 UTC
    use code-tags!
      this is the data I'm trying to extract from. Apologies for the lack of code tags in the previous post
      Message Type: DeleteSubscriber Request Text: <?xml version="1.0" encoding="UTF-8"?><msg> <head> + <RoundTripInfo>DeleteSubscriber</RoundTripInfo> </head> <b +ody> <HostIdentifier> <VendorID>CSGSYSTEMS</VendorI +D> <SiteIdent>877400000000</SiteIdent> </HostIdenti +fier> <DeleteSubscriber> <BSSubscriberKey>123000000 +0000002</BSSubscriberKey> <ExternalStatus>Voluntary Discon +nect</ExternalStatus> </DeleteSubscriber> </body></msg> Response Text: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <msg><head><RoundTripInfo>DeleteSubscriber</RoundTripInfo></head><body +><ClientIdentifier><CVendorID>JacobsRimell-APS</CVendorID><SiteIdent> +CSGListener</SiteIdent></ClientIdentifier><Stdmsgresp><ErrorCode>0000 +0</ErrorCode><ErrorMsg>Delete Subscriber: Subscriber is in abuse stat +e</ErrorMsg><SuggestedAction/></Stdmsgresp></body></msg> Error Code: 00000 Transaction Time: 133

      What I'm trying to do is lift the Error Code, Transaction Time and Message Type from the log file. The number of lines varies from each block that I'm matching.
      Thanks

        Did you consider reading the file all at once in a plain scalar?

        If you really want to process the file/array line by line then you can add a while loop in your if-statement that does something (or maybe nothing if you don't need the lines) with the data of those line.