I wrote the article. Whether you think it applies to Perl depends on some usually-unstated beliefs about regular expressions and how they should be used.
Regular expressions started out as a notation for describing
the strings that they match:
a(bb)*a says "first an a, then some
number of bb, then an a". They didn't say anything about how
you had to take an input string and check whether it matched,
and in fact there were multiple algorithms used for doing so.
As a computational measure, regular expressions describe
sets of strings that can be matched in linear time with constant space,
using an algorithm that isn't obvious from looking at the regular
expression notation.
Perl's regular expressions have come to be viewed as a notation
for an
algorithm that matches strings. Operators like backreferences
started the trend long before Perl, but new operators like recursion
and, to a lesser extend, lookahead and lookbehind assertions,
certainly continued the trend quite a ways beyond where it had been.
These features are most easily defined in terms of the underlying
matching, not in terms of the strings they describe.
And of course, as everyone loves to say, Perl's regular expressions
are not regular expressions anymore. (While this is true, a large fraction of the ones actually written
are true regular expressions and could be treated as such.)
The big difference between these two is whether you think regular
expressions should be treated as descriptive (here's the effect I want)
or prescriptive (here's how I want you to do it).
People who accept the existence of so-called pathological regular expressions
implicitly treat regular expressions as prescriptive:
"we gave you a lot of rope, so
of course you can manage
to hang yourself."
I believe that until one crosses into the not-too-often-visited
land of exotic Perl regular expression features like
generalized assertions, recursion, and backreferences,
it is best to treat regular expressions as descriptive.
Then the regular expression engine can implement the expression
efficiently no matter what you've typed.
Continuing the strained analogy from before,
"you can have all the rope you need, but the regex engine is going to
do the knot-tying for you once you tell it what you want to do,
so there is no way you can hang yourself."
In the prescriptive regex mindset, programmers have to microoptimize
their regular expressions, knowing how they get implemented
in order to avoid so-called pathological behavior by avoiding
potentially-expensive operations except when he is sure
(and hopefully correct!) that "they won't be expensive in this particular case."
In the descriptive regex mindset, there are no pathological regular expressions.
If there is some required substring, then fine, do that as a further
optimization. But the programmer can be
guaranteed that running a
regular expression will not be expensive, unless it uses advanced
features like assertions, recursion, or backreferences. Right now
*
and
? are potentially-expensive features, and they needn't be.
If the Perl programmer is the one writing the regular expressions,
then maybe there isn't much benefit to not having to learn how to
"optimize" regular expressions.
After all, one has already had to
learn Perl; learning how to
write regular expressions that Perl implements efficiently is not
much more work.
However, if the Perl programmer isn't the one writing the regular
expressions, then there is a lot of benefit to knowing that a regular
expression search isn't going to run for years and years.
As an example, let's consider a CGI script that is going to accept
regular expressions as search keys.
The programmer needs to execute those searches,
and it would be
very useful if Perl could be
relied upon to run them all efficiently, or at least to flag the
ones that might not run efficiently because they have things like recursion.
Perhaps there could be a regex flag that says "run this in linear time,
and if you can't, don't run it at all".
Then it would be plenty safe to take a regexp from the user and
use it to run a search.
But with the current Perl regexes, no way.
Lest you think the CGI example is contrived,
consider
Jeffrey Friedl's
Japanese-English dictionary,
which accepts regular expressions as search terms.
The regexp lookup is really slow already (maybe the host machines
are overloaded) but I bet it would be very very slow if you
gave it a regular expression like
((((.+)+)+)+)+(aaa|bbb).
(Cutting off the search isn't quite correct here, since there are some
entries that do match
aaa, at least if you are
searching for English.)
It's great that Perl 5.10 is going to have a hot-swappable regular
expression engine, because then maybe someone could build one
that handles the true-regular-expression notation in guaranteed
linear time and then Perl programmers could
ask for it if they wanted a guarantee of predictable behavior.
Even better, that would pave a way to having Perl check for
the non-regular operators, and if they weren't there,
choose the linear-time implementation automatically.
In short, the article is not about making Perl's regex matching
100x faster across all programs. It is about making it more
predictable and consistent across all programs.
And just to be clear,
that is no more incompatible with whatever optimizations you might add
(like exact string search) than backtracking is.