... There are in the order of 20,000 texts and 50,000-100,000 patterns,
In general it's much faster to match one large regex than many regexes many times.
That means you could try to assemble a regex of $x original regexes into one.
Now it seems you have to know which regex matched, which means you have to distinguish them. In perl 5.10.0 or above you can use named captures. If you can't require such a new perl version, you can try something like this instead:
our $which_matched; sub assemble_regex { my %regexes = @_; return join '|', map { q[(?:$regexes{$_})(?{\$which_matched='$_'})]} keys %regexes; }
This assumes that keys in %regexes don't contain single quotes and trailing backslashes.
If many of the patterns are constant strings, consider upgrading to perl 5.10.0 - it greatly speeds up matching of many constant alternatives.
In reply to Re: Efficient regex matching with qr//; Can I do better?
by moritz
in thread Efficient regex matching with qr//; Can I do better?
by kruppy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |