I'm looking for an approach that validates floats, or any other complex 'thing' embedded in a string, with a single regex rather than the two-step approach in the example
Generally matching "x but not y" is much harder than matching "x" on its own. The "float but not date" example is a fairly simple case: you can express it as / (?<! [-+.\d]) $re_float (?! [.\d]) /x, but there's quite a bundle of knowledge about the logic of a float getting distilled into that preamble and postamble. Automating that distillation for a generic "this complex thing (but not this other complex thing)" is likely to be somewhere between impossible and unprofitable.
I haven't looked at Friedl since shortly after the first edition was published; I'd certainly recommend having a look through all of perlre and having a play with any construct that is new to you.
More generally: context is everything. What is faster in one context is often slower in a different context. So if you have a problem you're trying to solve for which your existing solution isn't as fast as you want, you should provide it (or something like it) as the benchmark. If you're looking for something that is always better regardless of context, I don't think you'll find it.
For more complex parsing tasks I would also recommend looking at Regexp::Grammars. Making such a grammar fast can take some fiddling, but they make complexity a lot easier to deal with.
|