comment on

There is some repetition across your regexes that can be factored out. This maybe relates to the underlying cause.

Each regex starts with the same pattern: \s* ^ \s*. Checking for that before running the if conditions makes things about 250-260% faster under Strawberry perl 5.32, testing with a file of 500 begfoo sets generated using the code in 11128154. See code in sub parse_foo2. parse_foo1 is from the OP.

I also converted the condition to run in a while loop, mostly for style. The addition of the /aa flag makes a slight difference which could just be noise.

Note that I have not checked if all begfoo sets are parsed correctly...

I also don't have a version 5.8 to work with.

use 5.022;
use warnings;

use Benchmark qw {:all};

open my $fh, 'x.txt' or die;

my $data = do {local $/ = undef; <$fh>};

cmpthese (
    10,
    {
        one => sub {parse_foo1($data)},
        two => sub {parse_foo2($data)},
    }
);


sub parse_foo1 {
    my ($text) = @_;
    my $name;
    {
        last if $text =~ /\G \s* \Z/gcmsx;

        if     ($text =~ /\G \s* ^ \s* begfoo \s+ (\S+?) \s* \( \s* (.
+*?) \s* \) \s* ;/gcmsx) { $name = $1 }
        elsif  ($text =~ /\G \s* ^ \s* endfoo            /gcmsx) { }
        elsif  ($text =~ /\G \s* ^ \s* \S+ \s+  .*? \s* ;/gcmsx) { }
        else { die "ERROR: unknown syntax\n" }

        redo;
    }
    print "LAST FOO1: $name\n";
}

sub parse_foo2 {
    my ($text) = @_;
    my $name;
    while (not $text =~ /\G \s* \Z/gcmsx) {

        $text =~ /\G \s* /gcsmx;  #  march through any white space
        if     ($text =~ /\G begfoo \s+ (\S+?) \s* \( \s* (.*?) \s* \)
+ \s* ;/gcmsxaa) { $name = $1 }
        elsif  ($text =~ /\G endfoo            /gcmsx) { }
        elsif  ($text =~ /\G \S+ \s+  .*? \s* ;/gcmsx) { }
        else { die "ERROR: unknown syntax\n" }

    }
    print "LAST FOO2: $name\n";
}
[download]

Example results:

v5.32.0
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO1: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
LAST FOO2: FOO_500
      Rate  one  two
one 2.08/s   -- -72%
two 7.53/s 261%   --
[download]

In reply to Re: regex gotcha moving from 5.8.8 to 5.30.0? by swl
in thread regex gotcha moving from 5.8.8 to 5.30.0? by mordibity

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.