These features can be used to good advantage in using "split" to do a "first-level" parse of the record, and thereby get around some of the difficulties mentioned in the earlier reply.
I'm not sure exactly what sort of structure you want as output, but here's one approach, which you can probably tweak to suit your taste:
(update: added $timestamp in the print statement, which shows that it's not just a timestamp, but also an IP address.)#!/usr/bin/perl use strict; use Data::Dumper; $/ = 'messages:'; while ( <DATA> ) { my %struct = (); my ( $timestamp, @chunks ) = split( /(\S+:)\s+/ ); while ( @chunks ) { my $topkey = shift @chunks; my $data = shift @chunks; while ( $data =~ s/^(.*?)=//s ) { ( my $subkey = $1 ) =~ s/\s+$//; if ( $data =~ s/^"([^"]+)"\s+// ) { $struct{$topkey}{$subkey} = $1; } else { $data =~ s/^(\S+)\s+//; $struct{$topkey}{$subkey} = $1; } } } print "\nRecord $.: $timestamp\n", Dumper( \%struct ); } __DATA__ messages:Dec 17 09:41:08 10.14.93.7 ns5xp: NetScreen device_id=ns5xp system-notification-00257(traffic): start_time="2002-12-17 09:45:58" d +uration=5 policy_id=0 service=tcp/port:8000 proto =6 src zone=Trust dst zone=Untrust action=Permit sent=1034 rcvd=19829 +src=10.14.94.221 dst=10.14.90.217 src_port=1059 dst_port=8000 transla +ted ip=10.14.93.7 port=1223 messages:Dec 17 09:41:08 10.14.93.7 ns5xp: NetScreen device_id=ns5xp +system-notification-00257(traffic): start_time="2002-12-17 09:45:59" +duration=4 policy_id=0 service=tcp/port:8000 proto =6 src zone=Trust dst zone=Untrust action=Permit sent=722 rcvd=520 src +=10.14.94.221 dst=10.14.90.217 src_port=1060 dst_port=8000 translated + ip=10.14.93.7 port=1224
That gives you a hash structure (HoH) on each record / iteration. Maybe you want to push those onto an array? And/or maybe you don't need all the information?
In any case, I don't think look-ahead regexes are needed here (though I'm sure there are ways to do so, and these might even make for more legible logic).
Another update: It occurs to me that you might run into some data records where there are line breaks in awkward places (other than the particular awkward spot shown in your data sample, between "proto" and "=6"). If that's the case, I think the code above will still do the right thing, but in the absence of appropriate test data, it's hard to be sure...
In reply to Re: Parsing text files with a regex lookahead
by graff
in thread Parsing text files with a regex lookahead
by jalewis2
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |