in reply to Cloudflare uses Regexp::Debugger in explaining the outage

The article looks very thorough and detailed from what I've seen. The TLDR version (for the regex part. They give a lot of context information as well) would be this sentence:

what's important is any "real-world" expression (like the complex ones in our WAF rules) that ask the engine to "match anything followed by anything" can lead to catastrophic backtracking.
Basically /.*.*=/ is bad because the first .* will jump to the end of the string, and move back one character at a time, only for the second .* to take that character and start the whole process all over again before the engines gets a chance to check if the character is =.

I'm kind of surprised that the optimizer wouldn't remove such an obvious problem though. Maybe it's the non capturing group which prevents the two identical nodes from being merged. Or maybe their version of perl is too old and newer ones would have optimized that away correctly.

They don't mention it but I think the tool they used to turn the regexes into graphs is debuggex.

  • Comment on Re: Cloudflare uses Regexp::Debugger in explaining the outage

Replies are listed 'Best First'.
Re^2: Cloudflare uses Regexp::Debugger in explaining the outage
by daxim (Curate) on Jul 19, 2019 at 09:02 UTC
    maybe their version of perl is too old
    The regex were for Lua/PCRE.

      Oh you're right. So that's a case of perl to the rescue then :D.

      I still wonder if it's their version of LUA or the PCRE that is too old, or if it's missing an optimization.