comment on

Maybe I should explain what I'm trying to do. Basically, the script accepts a multi-line search pattern as input. It then checks whether the pattern appears in the log. If so, it returns the segment of the log that matched the pattern. Otherwise, it returns nothing. The output gets parsed downstream, and specific pieces of data are pulled out (hostname, phone number, problem report, etc.) and sent to a pager.

I had originally implemented this as a pair of nested foreach loops, similar to what masem suggested. The outer loop checked one line of the log at a time, and when the line matched the top line of the pattern, the inner loop would check the rest of the lines in the pattern against the next few lines in the log.

The problem with this was that for large logs, it would take a long time to complete and use a lot of CPU in the process. I realized that 5000 lines of log data times 20 lines per search pattern times 20 search patterns means analyzing 2,000,000 log file lines at a time. I was hoping that taking advantage of perl's regex engine could help cut this back. For most data, the regex method is orders of magnitude faster. Where the foreach method would take a full minute to parse through several thousand lines of text, the regex method typically takes under a second to zip through tens of thousands of lines. However, there are a few combinations of search patterns and log segments that seem to bog it down.

I am pretty new to writing regular expressions, in general. I had tried using .*? at one point, but I've been experimenting a lot in the process of troubleshooting. I think the reason I was using /(?:.(?!foo))*/ as opposed to /.*?foo/ was that I didn't want to match the line delimiter. I guess that doesn't matter, though, since I s// it out later, anyway. In any case, the regex engine gets bogged down with certain data either way.

It may be that I should go back to using while and foreach loops and find other ways to optimize the process, but I was hoping there was something profoundly broken (and therefore fixable) about the way I'm using regex.

examine

In reply to Re: Regex runs for too long by examine
in thread Regex runs for too long by examine

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.