You don't show us neither the log data you're matching against nor the strings you're searching. As you say that you're mostly looking for stuff at the end of the line, it might be worthwhile to reverse the string and look for the reversed word at the start of the string. See sexeger.
You are converting some glob patterns to regular expressions. Depending on how your glob patterns look, you can gain lots by applying your domain knowledge. For example, you will likely know that all your strings are anchored to the end of the line. Also, if you store the compiled regular expressions instead of recompiling them every time from a string (keys %regexp), you likely gain a bit of performance.
Another thing might be to build one large regular expression from your patterns, so the regex engine does the loop instead of Perl doing the loop. See Regexp::Assemble for example, or Regexp::Trie (although that one shouldn't be necessary if you're using 5.10).
Also consider that IO might well be a limiting factor while trying to read the file. Storing your logfile compressed and then spawning gzip -cd $logfile| might or might not improve the situation, depending on whether disk/network IO is limiting you or not.
In your code, you do
for (...) { next if $do_not_print;
You can stop iterating through that loop by using last instead when you set $do_not_print to 1.
In reply to Re: regexp performance on large logfiles
by Corion
in thread regexp performance on large logfiles
by snl_JYDawg
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |