Regexes and /o

l2kashe has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, Looking for some wisdom in the wild and wooly word of regexes, and hoping someone can shed some light.

I am processing server logs (dhcp to be exact), which present some interesting flaming hoops to jump through. Each transaction is a cluster 3 lines, but those lines arent necessarily one after the other in the file. This is due to the fact that the server logs the info as soon as it can, as opposed to waiting for the entire transaction. No prob, can deal with that.

My question in regards to the /o modifier is I see examples of using /o like

$match = '(foo|baz|bar)';

while (<INPUT>) {
   next if ($_ !~ /$match/o);
}
[download]

simple, straight forward, avoids regex recompilation. Now how does perl keep track of those compiled regexes, and to what depth does the optimization continue? I.e

$m_1 = '(bar|baz|foo)';
$m_2 = '([Bb]lah|[Cc]ore)';
$m_3 = '(root|sys|user)';

while (<INPUT>) {
   next if ($_ !~ /$m_1/o);

   chomp($line = $_);

   if ($line =~ /$m_2/o && $something) {
      &func("param");

   } elsif ($line =~ /$m_3/o && $something_else) {
      &other("var");
   }
}
[download]

And will the optimization be useful within sub functions?
I.e If a my'd variable is defined as $f = 'blah'; and used in a regex within the sub, is it a waste to use the /o modifier, due to moving into and out of scope of the sub? I believe that sub functions are compiled at runtime, and simply wait for calls to them, do what they are supposed to and return. Will defining the regex with /o make it be compiled once (the same time as the sub), hence forth to be retained till the program exits, or will it be recompiled each time the sub is entered?

Does perl keep track of each of those regex tokens (sorry for not knowing the right term there) seperately? Will the /o provide the functionality I am looking for? Is there a better way to approach the match? The program is fairly lengthy IMO in terms of how long it should be, I.e simply processing a dhcp log, but there are so many exceptions. I am trying to find the tightest way to use the flexibility of regexes, as well as the smallest amount of proc and/or mem as possible. I have segregated my functions and stream lined data processing to as few tests and calls as possible, localized my vars via my(), set array elements to 0 as opposed to undef to save processing time, and still need to squeeze a bit more out of it.
Just looking for insight/opinions/pointers Thanks

Comment on Regexes and /o Select or Download Code

Replies are listed 'Best First'.
Re: Regexes and /o by rir (Vicar) on Oct 08, 2002 at 23:23 UTC
Once means once. The regex is evaluated the first time the line is processed then Perl never checks it again. The exceptional case is when you use `eval` this lets you optimize and be flexible both. You can get your `$regex` value as per usual, but use it inside an `eval`. This can be a big win. `for ( XXX some loop ) { $patt = gen_patt( $blah); eval 'for $i ( @lines) { if ( $i =~ m/$patt/o){ process( $i); } }'; die "eval error" if $@; }` [download]	[reply] [d/l] [select]
Re: Re: Regexes and /o by Anonymous Monk on Oct 09, 2002 at 01:05 UTC
Just wanted to make sure I was getting what I expected, and according to this I am. I was thinking about playing with eval, but I was also attempting to leave it legible as possible for other admins who haven't played with perl as much. I think that going with eval is my next step.. Thanks for the help	[reply]
Re: Regexes and /o by blakem (Monsignor) on Oct 09, 2002 at 04:02 UTC
What version of perl are you using? In recent versions (5.6.0 and later I believe) the penalty for not using `/o` was greatly reduced. So, `/o` isn't really much of a gain anymore because the worst case scenerio it was designed to avoid is no longer applicable. -Blake	[reply] [d/l] [select]
(tye)Re: Regexes and /o by tye (Sage) on Oct 09, 2002 at 06:20 UTC
Yes! Precisely! And since you can use qr// for when you want maximum speed and maximum control, there is no reason to ever use /o. Ever! It should be deprecated from the language because the best it can do now is provide a slight speed boost (which you can still get by using qr// instead), but the worst it can do is break your code. Most of the times I see people use or recommend /o these days are cases where it makes absolutely no difference at all, or where it breaks their code. - tye (see also /o considered harmful)	[reply]
Re: Regexes and /o by perrin (Chancellor) on Oct 08, 2002 at 23:53 UTC
There's a very good writeup in the Perl Cookbook about this. It covers various ways you can handle regexes that you want to compile for a while (like within a specific loop) but not for the entire life of your program.	[reply]