comment on

I noticed the same problem in tye's first approach but just ignored it rather than fixing. And yes, I was on 5.005_03. So I re-ran your benchmark with my data on my laptop and got:

bash-2.01$ perl scanall.pl scantst
Benchmark: timing 5 iterations of BIG_REGEX, CODE_REGEX, MANY_REGEXES,
+ NO_REGEX...
 BIG_REGEX: 138 wallclock secs (133.78 usr +  0.96 sys = 134.74 CPU)
CODE_REGEX: 532 wallclock secs (517.61 usr +  2.68 sys = 520.29 CPU)
MANY_REGEXES: 644 wallclock secs (626.63 usr +  2.96 sys = 629.59 CPU)
  NO_REGEX: 176 wallclock secs (170.91 usr +  1.41 sys = 172.32 CPU)
bash-2.01$ p56 scanall.pl scantst
Benchmark: timing 5 iterations of BIG_REGEX, CODE_REGEX, MANY_REGEXES,
+ NO_REGEX...
 BIG_REGEX: 167 wallclock secs (160.56 usr +  2.00 sys = 162.56 CPU) @
+  0.03/s (n=5)
CODE_REGEX: 171 wallclock secs (166.17 usr +  1.49 sys = 167.66 CPU) @
+  0.03/s (n=5)
MANY_REGEXES: 241 wallclock secs (232.62 usr +  2.10 sys = 234.72 CPU)
+ @  0.02/s (n=5)
  NO_REGEX: 175 wallclock secs (169.77 usr +  1.34 sys = 171.11 CPU) @
+  0.03/s (n=5)
bash-2.01$
[download]

My dataset is around 4MB. My laptop is an old laptop, 64 MB RAM, 200 MHz CPU, running Linux. The biggest single difference though is that my data set is a version of the script catted against itself until I got up to around 4 MB. So most lines match some error.

Therefore your numbers mainly test how quickly it can discard a line, mine how quickly it can recognize that there is one and locate it. My regex, being so complex, gives virtually no chance for optimizations to throwing away lines. It also hits Ilya's new tests for some very slow REs and so slowed down moving forward, while the other tests sped up quite considerably. (Basically the big RE is testing for signs of excessive backtracking. Well the RE is designed specifically with avoiding backtracking in mind, but loses time on the testing.)

In reply to RE (tilly) 6: SAS log scanner by tilly
in thread SAS log scanner by nop

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.