There is some repetition across your regexes that can be factored out. This maybe relates to the underlying cause.

Each regex starts with the same pattern: \s* ^ \s*. Checking for that before running the if conditions makes things about 250-260% faster under Strawberry perl 5.32, testing with a file of 500 begfoo sets generated using the code in 11128154. See code in sub parse_foo2. parse_foo1 is from the OP.

I also converted the condition to run in a while loop, mostly for style. The addition of the /aa flag makes a slight difference which could just be noise.

Note that I have not checked if all begfoo sets are parsed correctly...

I also don't have a version 5.8 to work with.

use 5.022; use warnings; use Benchmark qw {:all}; open my $fh, 'x.txt' or die; my $data = do {local $/ = undef; <$fh>}; cmpthese ( 10, { one => sub {parse_foo1($data)}, two => sub {parse_foo2($data)}, } ); sub parse_foo1 { my ($text) = @_; my $name; { last if $text =~ /\G \s* \Z/gcmsx; if ($text =~ /\G \s* ^ \s* begfoo \s+ (\S+?) \s* \( \s* (. +*?) \s* \) \s* ;/gcmsx) { $name = $1 } elsif ($text =~ /\G \s* ^ \s* endfoo /gcmsx) { } elsif ($text =~ /\G \s* ^ \s* \S+ \s+ .*? \s* ;/gcmsx) { } else { die "ERROR: unknown syntax\n" } redo; } print "LAST FOO1: $name\n"; } sub parse_foo2 { my ($text) = @_; my $name; while (not $text =~ /\G \s* \Z/gcmsx) { $text =~ /\G \s* /gcsmx; # march through any white space if ($text =~ /\G begfoo \s+ (\S+?) \s* \( \s* (.*?) \s* \) + \s* ;/gcmsxaa) { $name = $1 } elsif ($text =~ /\G endfoo /gcmsx) { } elsif ($text =~ /\G \S+ \s+ .*? \s* ;/gcmsx) { } else { die "ERROR: unknown syntax\n" } } print "LAST FOO2: $name\n"; }

Example results:

v5.32.0 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 Rate one two one 2.08/s -- -72% two 7.53/s 261% --

In reply to Re: regex gotcha moving from 5.8.8 to 5.30.0? by swl
in thread regex gotcha moving from 5.8.8 to 5.30.0? by mordibity

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.