Greetings,
Looking for some wisdom in the wild and wooly word of regexes, and hoping someone can shed some light.
I am processing server logs (dhcp to be exact), which present some interesting flaming hoops to jump through. Each transaction is a cluster 3 lines, but those lines arent necessarily one after the other in the file. This is due to the fact that the server logs the info as soon as it can, as opposed to waiting for the entire transaction. No prob, can deal with that.
My question in regards to the /o modifier is I see examples of using /o like
$match = '(foo|baz|bar)';
while (<INPUT>) {
next if ($_ !~ /$match/o);
}
simple, straight forward, avoids regex recompilation.
Now how does perl keep track of those compiled regexes, and to what depth does the optimization continue?
I.e
$m_1 = '(bar|baz|foo)';
$m_2 = '([Bb]lah|[Cc]ore)';
$m_3 = '(root|sys|user)';
while (<INPUT>) {
next if ($_ !~ /$m_1/o);
chomp($line = $_);
if ($line =~ /$m_2/o && $something) {
&func("param");
} elsif ($line =~ /$m_3/o && $something_else) {
&other("var");
}
}
And will the optimization be useful within sub functions?
I.e If a my'd variable is defined as $f = 'blah'; and used in a regex within the sub, is it a waste to use the /o modifier, due to moving into and out of scope of the sub? I believe that sub functions are compiled at runtime, and simply wait for calls to them, do what they are supposed to and return. Will defining the regex with /o make it be compiled once (the same time as the sub), hence forth to be retained till the program exits, or will it be recompiled each time the sub is entered?
Does perl keep track of each of those regex tokens (sorry for not knowing the right term there) seperately?
Will the /o provide the functionality I am looking for?
Is there a better way to approach the match?
The program is fairly lengthy IMO in terms of how long it should be, I.e simply processing a dhcp log, but there are so many exceptions. I am trying to find the tightest way to use the flexibility of regexes, as well as the smallest amount of proc and/or mem as possible. I have segregated my functions and stream lined data processing to as few tests and calls as possible, localized my vars via my(), set array elements to 0 as opposed to undef to save processing time, and still need to squeeze a bit more out of it.
Just looking for insight/opinions/pointers
Thanks