comment on

Greetings, Looking for some wisdom in the wild and wooly word of regexes, and hoping someone can shed some light.

I am processing server logs (dhcp to be exact), which present some interesting flaming hoops to jump through. Each transaction is a cluster 3 lines, but those lines arent necessarily one after the other in the file. This is due to the fact that the server logs the info as soon as it can, as opposed to waiting for the entire transaction. No prob, can deal with that.

My question in regards to the /o modifier is I see examples of using /o like

$match = '(foo|baz|bar)';

while (<INPUT>) {
   next if ($_ !~ /$match/o);
}
[download]

simple, straight forward, avoids regex recompilation. Now how does perl keep track of those compiled regexes, and to what depth does the optimization continue? I.e

$m_1 = '(bar|baz|foo)';
$m_2 = '([Bb]lah|[Cc]ore)';
$m_3 = '(root|sys|user)';

while (<INPUT>) {
   next if ($_ !~ /$m_1/o);

   chomp($line = $_);

   if ($line =~ /$m_2/o && $something) {
      &func("param");

   } elsif ($line =~ /$m_3/o && $something_else) {
      &other("var");
   }
}
[download]

And will the optimization be useful within sub functions?
I.e If a my'd variable is defined as $f = 'blah'; and used in a regex within the sub, is it a waste to use the /o modifier, due to moving into and out of scope of the sub? I believe that sub functions are compiled at runtime, and simply wait for calls to them, do what they are supposed to and return. Will defining the regex with /o make it be compiled once (the same time as the sub), hence forth to be retained till the program exits, or will it be recompiled each time the sub is entered?

Does perl keep track of each of those regex tokens (sorry for not knowing the right term there) seperately? Will the /o provide the functionality I am looking for? Is there a better way to approach the match? The program is fairly lengthy IMO in terms of how long it should be, I.e simply processing a dhcp log, but there are so many exceptions. I am trying to find the tightest way to use the flexibility of regexes, as well as the smallest amount of proc and/or mem as possible. I have segregated my functions and stream lined data processing to as few tests and calls as possible, localized my vars via my(), set array elements to 0 as opposed to undef to save processing time, and still need to squeeze a bit more out of it.
Just looking for insight/opinions/pointers Thanks

In reply to Regexes and /o by l2kashe

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.