Thanks for the replies so far. Taking out all the /o flags (which were supposed to speed up regexes back when we actually used Perl 5.6, yes this project is that old although not this code) helps a bit, now under Perl 5.24.1 the timing is:

woz$ perlbrew use 5.18.0
woz$ time perl -CSD registerRecordDeviceDriver.pl softIoc.dbd

real	0m0.417s
user	0m0.377s
sys	0m0.020s
woz$ perlbrew use 5.24.1
woz$ time perl -CSD registerRecordDeviceDriver.pl softIoc.dbd

real	0m7.549s
user	0m7.215s
sys	0m0.077s

So that's another 2 seconds saved, but it still takes 7 seconds longer than it does under Perl 5.18.0.

@Ken: I'm using our variables because they are actually set in another module. Given that the profiler doesn't show any significant amount of time spent in the (presumably related, but I don't know the internals) Parser::CORE:regcomp opcode I don't think pre-compiling the regexp's will make any difference.

Looking through the individual regexp profiles again, I now see that there is one for detecting Perl POD which is taking up almost all of that 7 seconds:

if (m/\G ( = [a-zA-Z] .* ) \n/xgc) { $obj->add_pod($1, parsePod()); }

Any ideas why this specific regexp is so slow in Perl >= 5.20? It's probably the only one that uses .* to match to the end of a line.

I tried adding use re "debug"; and it's outputting lots of lines like this, which given the reference to an anchored substr "=" is probably the above match:

doing 'check' fbm scan, [345261..414818] gave 345300 Found floating substr "%n" at offset 345300 (rx_origin now 345259).. +. doing 'other' fbm scan, [345259..345299] gave -1 Contradicts anchored substr "="; about to retry anchored at offset 3 +45301 (rx_origin now 345299)... doing 'check' fbm scan, [345301..414818] gave 345317 Found floating substr "%n" at offset 345317 (rx_origin now 345299).. +. doing 'other' fbm scan, [345299..345316] gave -1 Contradicts anchored substr "="; about to retry anchored at offset 3 +45318 (rx_origin now 345316)...

Further hints on understanding what those messages mean and how to clean this up would be most appreciated.

- Andrew

In reply to Re: Parser Performance Question by songmaster
in thread Parser Performance Question by songmaster

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.