in reply to Re: Best practice validating numerics with regex?
in thread Best practice validating numerics with regex?

'Efficient' is whether an alternative regular expression benchmarks faster than an existing solution. I'm looking for an approach that validates floats, or any other complex 'thing' embedded in a string, with a single regex rather than the two-step approach in the example, and no, I did not (yet) attempt a solution that extracts multiple float candidates from a single string (/g is likely). An example is just that, an example, that one can build on once one understands the limitations of one approach and the additional capabilities of an alternative approach. I tried to make it clear in my write-up that I am trying to build on lots of experience and knowledge gained from studying Friedl, without access to anything later (he used 5.8.8) or more advanced than Friedl. Cookbook, 2nd takes the regex technology only up to 5.14, so it misses the mark too on illuminating the regex state-of-the-art. More to do...
  • Comment on Re^2: Best practice validating numerics with regex?

Replies are listed 'Best First'.
Re^3: Best practice validating numerics with regex?
by hv (Prior) on Oct 17, 2023 at 02:38 UTC

    I'm looking for an approach that validates floats, or any other complex 'thing' embedded in a string, with a single regex rather than the two-step approach in the example

    Generally matching "x but not y" is much harder than matching "x" on its own. The "float but not date" example is a fairly simple case: you can express it as / (?<! [-+.\d]) $re_float (?! [.\d]) /x, but there's quite a bundle of knowledge about the logic of a float getting distilled into that preamble and postamble. Automating that distillation for a generic "this complex thing (but not this other complex thing)" is likely to be somewhere between impossible and unprofitable.

    I haven't looked at Friedl since shortly after the first edition was published; I'd certainly recommend having a look through all of perlre and having a play with any construct that is new to you.

    More generally: context is everything. What is faster in one context is often slower in a different context. So if you have a problem you're trying to solve for which your existing solution isn't as fast as you want, you should provide it (or something like it) as the benchmark. If you're looking for something that is always better regardless of context, I don't think you'll find it.

    For more complex parsing tasks I would also recommend looking at Regexp::Grammars. Making such a grammar fast can take some fiddling, but they make complexity a lot easier to deal with.

Re^3: Best practice validating numerics with regex?
by NERDVANA (Priest) on Oct 17, 2023 at 00:26 UTC

    It's fairly easy to create a single high-performance regex that will capture the first (or every) valid float in a string. I would think that the main reason to use two regexes (one to cast a broad net, and one to validate it) would be to helpfully report syntax errors instead of skipping over them and reporting a more generic error. Is that why you're trying to do this?

    I'm also not clear on your question, really. (but, I also don't have the book you are referencing)

    my $lookAhead = qr/ (?! (?: .*\.){2,}) /x; my $regex = qr/ ^ $lookAhead [+-]? [\d.]+ $/x; ... for my $str (@strings) { say "\$str => $str"; if ($str =~ / [+-]?[\d.]+ /x) { # Pattern fails without this step +; why??? if ($& =~ $regex) {

    Your $regex uses '^' and '$', so of course you would need to load the digits into an isolated string first, so I'm guessing I don't understand the question. Could you show an example of the code construct that fails that you think should succeed?

Re^3: Best practice validating numerics with regex?
by perlboy_emeritus (Scribe) on Oct 17, 2023 at 00:12 UTC

    Oops, my bad. I wrote this comment re my definition of 'efficient' without logging in, so it is cataloged under anonymous rather than me, perlboy_emeritus. Perhaps some kind soul with admin rights can attach my real ID to that post. And perhaps I'm overstepping the purpose of perlmonks.org? I'm looking for an interesting discussion of ways and means rather than a single solution to a pending problem. Perhaps that is not what perlmonks.org is for, and if I am out of line, I will stop posting these questions.

    Will

      And perhaps I'm overstepping the purpose of perlmonks.org? I'm looking for an interesting discussion of ways and means rather than a single solution to a pending problem. Perhaps that is not what perlmonks.org is for, and if I am out of line, I will stop posting these questions.

      You are not overstepping the purpose of perlmonks.org. That Perl Monks is a very different place to Stack Overflow is indicated by this classic quote from Perl Monks pioneer tye:

      Most languages are like stackoverflow: I have a question, I want the best answer. Perl is like PerlMonks: I have a doubt, I want to read an interesting discussion about it that is likely to go on a tangent. q-:

      To improve your regex, as noted by Discipulus here, I suggest you check out every node written by tybalt89 ... oh, and given it provides a "single regular expression that defines a set of independent subpatterns suitable for matching entire Perl documents", you might also enjoy studying PPR, written by TheDamian.

      👁️🍾👍🦟