Re: pattern matching (greedy, non-greedy,...)

Here's an approach to use if you have all the 'log lines' as a single (possibly quite long) string as mention of KEY.*PATTERN in your OP suggests:

>perl -wMstrict -le
"my $s = 'KE a KE bb ccc KE ddd PA ee KE fff xx PA gg KEPA h';
 my $start     = qr{ KE }xms;
 my $not_start = qr{ (?! $start) . }xms;
 my $stop      = qr{ PA }xms;
 my $chunk     = qr{ $start $not_start* $stop }xms;
 print qq{'$s'};
 print map qq{'$_' }, $s =~ m{ $chunk }xmsg;
"
'KE a KE bb ccc KE ddd PA ee KE fff xx PA gg KEPA h'
'KE ddd PA' 'KE fff xx PA' 'KEPA'
[download]

This won't work if you are processing the file line-by-line. I'm working on that as, no doubt, are others.

Update: ... like toolic.

It should be mentioned that if if you are processing a multi-line file slurp, the $start and $stop regexes should be something like qr{ ^ KE $ }xms and qr{ ^ PA $ }xms respectively – note the ^ $ embedded newline anchor metacharacters.

Comment on Re: pattern matching (greedy, non-greedy,...) Select or Download Code

Replies are listed 'Best First'.
Re^2: pattern matching (greedy, non-greedy,...) by cacophony777 (Initiate) on Dec 17, 2009 at 01:22 UTC
Wow, thanks! Processing the entire file at once is fine for what I'm trying to do. Here's what I had written so far (I just started with Perl so be gentle): `open (IN, 'input.txt') or die "$!"; my $lines = do {local $/; <IN>}; close IN; while ($lines =~ s/Key.+?value=(\d+).+?Screen:add.+?value=(\d+).+?Xml: +sendRequest.+?value=(\d+).+?Xml:onResponse.+?value=(\d+).+?Xml:proces +sing.+?value=(\d+)//s){ # then I would use $1 - $5 }` [download] I'm not sure yet how to incorporate your solution into what I have, but perhaps I should do some more reading. Also, to clarify, the file has multiple lines but the KEY and PATTERN values don't fall on their own line as my original example illustrates. I made it a bit too simplistic. It looks more like: `BLAH BLAH BLAH KEY blah blah blah BlAH BLAH BLAH ABD KEY blah blah asdf asdf asdf asdf BLAH ASDF PATTERN blah blah` [download]	[reply] [d/l] [select]
Re^3: pattern matching (greedy, non-greedy,...) by AnomalousMonk (Archbishop) on Dec 17, 2009 at 01:52 UTC
It looks like you are using a `s///` substitution to repeatedly search from the very start of the string and then snip out already-processed substrings so that you don't encounter them again. It would be so much easier (and faster, if the string/file is huge) to use the `/g` modifier on a `m//` match and deal with each sub-string as it is found. See Modifiers in perlre, also see perlretut, perlrequick. A little whitespace and formatting never hurts, either. See the `/x` modifier in the references above. Another suggestion is to factor out sub-patterns with a collection of `qr//` regex object definitions (see references above). As with code in general, such factoring allows you to better understand and control the final regex. An example of such factoring is in the code of my original reply. OTOH, since it looks like you may be trying to parse XML, the best advice might be to not use regexes at all; use one of the many fine XML parser modules from CPAN: see XML::Parse (others will be better able than I to advise you on this).	[reply] [d/l] [select]
Re^3: pattern matching (greedy, non-greedy,...) by AnomalousMonk (Archbishop) on Dec 17, 2009 at 02:15 UTC
... the file has multiple lines but the KEY and PATTERN values don't fall on their own line as my original example illustrates. No matter. Just don't use the `^ $` embedded newline anchors at the beginnings and ends of your start and stop patterns. (Of course, they can still be used elsewhere.) See discussions of the m regex modifier (Modifiers) in perlre and other cited refs. The example string in Re: pattern matching (greedy, non-greedy,...) has no newlines in it at all, anywhere! Update: Oops. This reply would have been better as an update to Re^3: pattern matching (greedy, non-greedy,...).	[reply] [d/l]