The article looks very thorough and detailed from what I've seen. The TLDR version (for the regex part. They give a lot of context information as well) would be this sentence:
what's important is any "real-world" expression (like the complex ones in our WAF rules) that ask the engine to "match anything followed by anything" can lead to catastrophic backtracking. Basically /.*.*=/ is bad because the first .* will jump to the end of the string, and move back one character at a time, only for the second .* to take that character and start the whole process all over again before the engines gets a chance to check if the character is =.
I'm kind of surprised that the optimizer wouldn't remove such an obvious problem though. Maybe it's the non capturing group which prevents the two identical nodes from being merged. Or maybe their version of perl is too old and newer ones would have optimized that away correctly.
They don't mention it but I think the tool they used to turn the regexes into graphs is debuggex.
|