There is some repetition across your regexes that can be factored out. This maybe relates to the underlying cause.
Each regex starts with the same pattern: \s* ^ \s*. Checking for that before running the if conditions makes things about 250-260% faster under Strawberry perl 5.32, testing with a file of 500 begfoo sets generated using the code in 11128154. See code in sub parse_foo2. parse_foo1 is from the OP.
I also converted the condition to run in a while loop, mostly for style. The addition of the /aa flag makes a slight difference which could just be noise.
Note that I have not checked if all begfoo sets are parsed correctly...
I also don't have a version 5.8 to work with.
use 5.022; use warnings; use Benchmark qw {:all}; open my $fh, 'x.txt' or die; my $data = do {local $/ = undef; <$fh>}; cmpthese ( 10, { one => sub {parse_foo1($data)}, two => sub {parse_foo2($data)}, } ); sub parse_foo1 { my ($text) = @_; my $name; { last if $text =~ /\G \s* \Z/gcmsx; if ($text =~ /\G \s* ^ \s* begfoo \s+ (\S+?) \s* \( \s* (. +*?) \s* \) \s* ;/gcmsx) { $name = $1 } elsif ($text =~ /\G \s* ^ \s* endfoo /gcmsx) { } elsif ($text =~ /\G \s* ^ \s* \S+ \s+ .*? \s* ;/gcmsx) { } else { die "ERROR: unknown syntax\n" } redo; } print "LAST FOO1: $name\n"; } sub parse_foo2 { my ($text) = @_; my $name; while (not $text =~ /\G \s* \Z/gcmsx) { $text =~ /\G \s* /gcsmx; # march through any white space if ($text =~ /\G begfoo \s+ (\S+?) \s* \( \s* (.*?) \s* \) + \s* ;/gcmsxaa) { $name = $1 } elsif ($text =~ /\G endfoo /gcmsx) { } elsif ($text =~ /\G \S+ \s+ .*? \s* ;/gcmsx) { } else { die "ERROR: unknown syntax\n" } } print "LAST FOO2: $name\n"; }
Example results:
v5.32.0 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO1: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 LAST FOO2: FOO_500 Rate one two one 2.08/s -- -72% two 7.53/s 261% --
In reply to Re: regex gotcha moving from 5.8.8 to 5.30.0?
by swl
in thread regex gotcha moving from 5.8.8 to 5.30.0?
by mordibity
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |