comment on

Thanks for the replies so far. Taking out all the /o flags (which were supposed to speed up regexes back when we actually used Perl 5.6, yes this project is that old although not this code) helps a bit, now under Perl 5.24.1 the timing is:

woz$ perlbrew use 5.18.0
woz$ time perl -CSD registerRecordDeviceDriver.pl softIoc.dbd

real	0m0.417s
user	0m0.377s
sys	0m0.020s
woz$ perlbrew use 5.24.1
woz$ time perl -CSD registerRecordDeviceDriver.pl softIoc.dbd

real	0m7.549s
user	0m7.215s
sys	0m0.077s

So that's another 2 seconds saved, but it still takes 7 seconds longer than it does under Perl 5.18.0.

@Ken: I'm using our variables because they are actually set in another module. Given that the profiler doesn't show any significant amount of time spent in the (presumably related, but I don't know the internals) Parser::CORE:regcomp opcode I don't think pre-compiling the regexp's will make any difference.

Looking through the individual regexp profiles again, I now see that there is one for detecting Perl POD which is taking up almost all of that 7 seconds:

        if (m/\G ( = [a-zA-Z] .* ) \n/xgc) {
            $obj->add_pod($1, parsePod());
        }
[download]

Any ideas why this specific regexp is so slow in Perl >= 5.20? It's probably the only one that uses .* to match to the end of a line.

I tried adding use re "debug"; and it's outputting lots of lines like this, which given the reference to an anchored substr "=" is probably the above match:

  doing 'check' fbm scan, [345261..414818] gave 345300
  Found floating substr "%n" at offset 345300 (rx_origin now 345259)..
+.
  doing 'other' fbm scan, [345259..345299] gave -1
  Contradicts anchored substr "="; about to retry anchored at offset 3
+45301 (rx_origin now 345299)...
  doing 'check' fbm scan, [345301..414818] gave 345317
  Found floating substr "%n" at offset 345317 (rx_origin now 345299)..
+.
  doing 'other' fbm scan, [345299..345316] gave -1
  Contradicts anchored substr "="; about to retry anchored at offset 3
+45318 (rx_origin now 345316)...
[download]

Further hints on understanding what those messages mean and how to clean this up would be most appreciated.

- Andrew

In reply to Re: Parser Performance Question by songmaster
in thread Parser Performance Question by songmaster

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.