Here's one for regex ninjas.
Let's look at this example, a simple grep:
my $pat = shift; while (<>) { print if /$pat/; }
As you certainly all know, Perl likes to build the machine for a regex when it compiles the program. For regexes with variables in them, it rebuilds the machine every time it uses the regex. So here, Perl rebuilds the machine every time through the loop, which makes this program really slow.
You certainly also know that this program can be optimized with the o-flag:
my $pat = shift; while (<>) { print if /$pat/o; }
This tells Perl that $pat never changes. Perl will compile the machine for /$pat/ the first time it uses the regex and then remember it for later.
The situation is a bit more complicated if you want to match more than one pattern, e.g.:
my @pats = ('fo*', 'ba.', 'w+3'); while (<>) { foreach $pat (@pats) { print if /$pat/; } }
Obviously the /../o trick won't work here, because then only the first pattern would be compiled. But this program can be made much faster by joining all patterns into one:
my @pats = ('fo*', 'ba.', 'w+3'); my $pat = join('|', @pats); while (<>) { print if /$pat/o; }
So far, so good. Now for my problem. Let's asume we have a little plugin system and plugins can register functions to be called when a given regex matches a line.
Our program will store pairs of </pattern/i, funcref> in a hash %patterns and then basically do something like this:
for my $line (@lines) { for my $pattern (keys(%patterns)) { if (my @params = ($line =~ $pattern)) { my $func = $patterns{$pattern}; if (defined($func)) { $func->(@params); } } } }
Again, this is very slow, because Perl needs to rebuild the machine for the regex every time through the loop, for every line. But the problem is: I can't use the trick shown above (join all patterns into one string with '|') because then I can't decide which pattern in the string matched and so I don't know which function to call.
I tried a lot of different things without success, so now I hope for the expertise of the Perl Monks. How could this mechanism be optimized? Is there any way at all? Perhaps my approach is total bullshit and there's a much better one to do this? I'm looking forward to your ideas!
(Examples taken from http://perl.plover.com/Regex/article.html
In reply to Optimize a pluggable regex matching mechanism by dredd
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |