Regex to select multiple lines

beeny has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regex to select multiple lines by pernod (Chaplain) on Dec 09, 2004 at 16:13 UTC
Ok, some heavy assumptions in the following code, but oh well. Here goes: #! /usr/bin/perl use strict; # Assume a blank line to mean no whitespace $/ = "\n\n"; foreach my $record ( <DATA> ) { chomp( $record ); my @data = split( "\n", $record ); # Take the last element of the last three lines in the record my $txname = ( split( / /, $data[ -3 ] ) )[-1]; my $txtime = ( split( / /, $data[ -2 ] ) )[-1]; my $errorcode = ( split( / /, $data[ -1 ] ) )[-1]; print "$txname\t\t\t$txtime\t\t$errorcode\n"; } __DATA__ Message Type: Name: Someone Time: Now Errorcode: 42 Message Type: Seismic instability Source: Japan Name: Anyone Time: Yesterday Errorcode: 23 Message Type: Author: Lewis Carrol Work: The hunting of whatever Name: Bellman Time: 12:32:1892 Errorcode: Snark! [download] The main point here is the use of the Input Record Separator (`$/`) to split the file into records. Then you can split on newlines and fetch the last three rows. The code above contains quite a bit of room for improvement in the regex-department (grabbing results into `$1` and friends, for example), but I'll not muddy my point about the `$/`. On my box, this (untidily) prints: `Someone Now 42 Anyone Yesterday 23 Bellman 12:32:1892 Snark!` [download] Good luck! pernod -- Mischief. Mayhem. Soap.	[reply] [d/l] [select]
Re^2: Regex to select multiple lines by beeny (Initiate) on Dec 09, 2004 at 17:02 UTC
Okie after reviewing the replies to my question, thanks to all of you. I came up with this. `sub chewitup { for($i=0; $i<$array_size; $i++){ chomp($data[$i]); if($data[$i] =~ /Message Type:/){ @details0=split / /, $data[$i]; unless($data[$i] eq $blank){ if($data[$i] =~ /Error Code:/){ @details2=split / /, $data[$i]; } if($data[$i] =~ /Transaction Time:/){ @details1=split / /, $data[$i]; } next $i; } $txname=$details0[2]; $txtime=$details1[2]; $errorcode=$details2[2]; print OUT "$txname\t\t\t$txtime\t\t$errorcode\n"; } } }` [download] Opinions please...thanks	[reply] [d/l]
Re: Regex to select multiple lines by EdwardG (Vicar) on Dec 09, 2004 at 16:05 UTC
Have you considered dealing with your input text as a whole, rather than line by line? You could, for instance, split your input into "messages" with code like this: `my @details = split /Message Type/, $data;` [download] Once you have these "messages" as array elements, you can then look at each one as a discrete item, rather than having to deal with the problem of where one message ends and another starts. If you know that the last few items are the ones you want, you can reference them in a number of ways, depending on your circumstance. One way is to simply use `$#array`, which gives you the subscript of the last element in the array, like this: `my $last_item = $array[$#array];` [download] The item-before-last would then be: `my $item_before_last = $array[ $#array - 1 ];` [download] Alternatively, if efficiency is not important, you could reverse the array like this: `@array = reverse @array; # ...and the last shall be first :) my $last_item = $array[0];` [download] I hope this helps, but I suspect it won't be enough.	[reply] [d/l] [select]
Re: Regex to select multiple lines by davorg (Chancellor) on Dec 09, 2004 at 15:49 UTC
I think we need to see more of your code and a sample of your input data. And please read the formatting guidelines. Your array indexes look very confusing. -- <http://www.dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply]
Re: Regex to select multiple lines by Animator (Hermit) on Dec 09, 2004 at 15:53 UTC
use code-tags!	[reply]
Re^2: Regex to select multiple lines by beeny (Initiate) on Dec 09, 2004 at 16:01 UTC
this is the data I'm trying to extract from. Apologies for the lack of code tags in the previous post Message Type: DeleteSubscriber Request Text: <?xml version="1.0" encoding="UTF-8"?><msg> <head> + <RoundTripInfo>DeleteSubscriber</RoundTripInfo> </head> <b +ody> <HostIdentifier> <VendorID>CSGSYSTEMS</VendorI +D> <SiteIdent>877400000000</SiteIdent> </HostIdenti +fier> <DeleteSubscriber> <BSSubscriberKey>123000000 +0000002</BSSubscriberKey> <ExternalStatus>Voluntary Discon +nect</ExternalStatus> </DeleteSubscriber> </body></msg> Response Text: <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <msg><head><RoundTripInfo>DeleteSubscriber</RoundTripInfo></head><body +><ClientIdentifier><CVendorID>JacobsRimell-APS</CVendorID><SiteIdent> +CSGListener</SiteIdent></ClientIdentifier><Stdmsgresp><ErrorCode>0000 +0</ErrorCode><ErrorMsg>Delete Subscriber: Subscriber is in abuse stat +e</ErrorMsg><SuggestedAction/></Stdmsgresp></body></msg> Error Code: 00000 Transaction Time: 133 [download] What I'm trying to do is lift the Error Code, Transaction Time and Message Type from the log file. The number of lines varies from each block that I'm matching. Thanks	[reply] [d/l]
Re^3: Regex to select multiple lines by Animator (Hermit) on Dec 09, 2004 at 16:07 UTC
Did you consider reading the file all at once in a plain scalar? If you really want to process the file/array line by line then you can add a while loop in your if-statement that does something (or maybe nothing if you don't need the lines) with the data of those line.	[reply]