Maybe I should explain what I'm trying to do. Basically, the script accepts a multi-line search pattern as input. It then checks whether the pattern appears in the log. If so, it returns the segment of the log that matched the pattern. Otherwise, it returns nothing. The output gets parsed downstream, and specific pieces of data are pulled out (hostname, phone number, problem report, etc.) and sent to a pager.
I had originally implemented this as a pair of nested foreach loops, similar to what masem suggested. The outer loop checked one line of the log at a time, and when the line matched the top line of the pattern, the inner loop would check the rest of the lines in the pattern against the next few lines in the log.
The problem with this was that for large logs, it would take a long time to complete and use a lot of CPU in the process. I realized that 5000 lines of log data times 20 lines per search pattern times 20 search patterns means analyzing 2,000,000 log file lines at a time. I was hoping that taking advantage of perl's regex engine could help cut this back. For most data, the regex method is orders of magnitude faster. Where the foreach method would take a full minute to parse through several thousand lines of text, the regex method typically takes under a second to zip through tens of thousands of lines. However, there are a few combinations of search patterns and log segments that seem to bog it down.
I am pretty new to writing regular expressions, in general. I had tried using .*? at one point, but I've been experimenting a lot in the process of troubleshooting. I think the reason I was using /(?:.(?!foo))*/ as opposed to /.*?foo/ was that I didn't want to match the line delimiter. I guess that doesn't matter, though, since I s// it out later, anyway. In any case, the regex engine gets bogged down with certain data either way.
It may be that I should go back to using while and foreach loops and find other ways to optimize the process, but I was hoping there was something profoundly broken (and therefore fixable) about the way I'm using regex.
examineIn reply to Re: Regex runs for too long
by examine
in thread Regex runs for too long
by examine
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |