comment on

You don't show us neither the log data you're matching against nor the strings you're searching. As you say that you're mostly looking for stuff at the end of the line, it might be worthwhile to reverse the string and look for the reversed word at the start of the string. See sexeger.

You are converting some glob patterns to regular expressions. Depending on how your glob patterns look, you can gain lots by applying your domain knowledge. For example, you will likely know that all your strings are anchored to the end of the line. Also, if you store the compiled regular expressions instead of recompiling them every time from a string (keys %regexp), you likely gain a bit of performance.

Another thing might be to build one large regular expression from your patterns, so the regex engine does the loop instead of Perl doing the loop. See Regexp::Assemble for example, or Regexp::Trie (although that one shouldn't be necessary if you're using 5.10).

Also consider that IO might well be a limiting factor while trying to read the file. Storing your logfile compressed and then spawning gzip -cd $logfile| might or might not improve the situation, depending on whether disk/network IO is limiting you or not.

In your code, you do

for (...) {
    next if $do_not_print;
[download]

You can stop iterating through that loop by using last instead when you set $do_not_print to 1.

In reply to Re: regexp performance on large logfiles by Corion
in thread regexp performance on large logfiles by snl_JYDawg

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.