comment on

You can get a marginal improvement using named loop control to break processing ASAP:

LINE: foreach my $line (@SYSLOG) {
  foreach my $r (@REGEXPS) {
    next LINE if $line =~ /$r/;
  }
  push @KEEP, $line;
}
[download]

This does not address the scalability issue, but it will speed things up. If most of the lines are being rejected by the first few REs, it might speed it up a lot.

Beyond that, if you had simpler regular expressions you could try tricks like trieing them together to stop the RE engine from doing so much redundant work. Unfortunately that will be hard with the examples that you gave. However you might take some of your complex REs which are closely related and find ways to combine them...

Incidentally I see that a lot of the work being done by your REs looks like you are parsing the syslog format for various specific strings. If that is so, then you could parse each line into a small data structure and then make what you are currently doing by an RE match be doable in a simpler fashion.

However trying to do anything fancy may lose on overhead. You pretty much have to try it and see.

But in the end if you want to do lots of arbitrary checks against lots of arbitrary strings, lots of work will need to be done.

In reply to Re: Removing all lines from an array which match against a second array by tilly
in thread Removing all lines from an array which match against a second array by skx

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.