in reply to Re: Best practice validating numerics with regex?
in thread Best practice validating numerics with regex?

Extraordinary! Even multiple float candidates in a string, the point made by hv. (/g). You've sent me off on a new tangent :-) However, I hate to admit this but I have never used bitwise operations on strings and so far, the net is not a good source of examples, and perldoc List::AllUtils is a source of frustration. I've been stepping through a subset of your code with debug to get a handle on ' |. ', which I think ORs two test strings together to get the longest for the print "%*s" width expression; 36 is definitely the longest in that array. If you would, please explain for someone naive with respect to bitwise string operations this expression?

my $leftside = length reduce {$a |. $b->[0]} @floats; # auto-adjust

I think, if the next string is longer than the previous, reduce pads the length for the next test, so the char values themselves, which are Unicode, aren't themselves relevant. When the iteration finishes $leftside contains the length of the longest Unicode string. I finally gave up noodling re what the significance of ORing two chars might be, other than non-ASCII Unicode characters can be multi-byte, and settled on the notion of building up the longest 'dingus' from the set of 'dinguses'. Is that the idea?

I would not have thought to use a bitwise operation to calculate the longest length of a set of strings though I guess that is one of the reasons why List::Util exists. My first thought was to use:

my $leftside = length reduce { length($a) >= length($b->[0]) ? $a : $b->[0] } @floats;

Or, if I got the purpose of that expression wrong, please point me to a reading assignment, other than perldoc List::AllUtils?

Thanks tybalt89 for a very interesting example.

Will

U P D A T E 10/18/2023

Thank you dasgar for insights. I used the term 'Extraordinary'; tybalt89 is brilliant, and of course, Perl is the eighth wonder of the known world. "use feature 'bitwise'" introduces |. which is useful for ORing strings, and 'bitwise' assures us that strings are treated as codepoints rather than graphenes. Why is this useful? Because length, sprintf and printf determine length attributes in codepoints, so just counting graphenes (the user visible notion of a character) can yield the wrong answer. Both length and reduce |. yield the same answer but working with bits is much faster. So, tybalt89's method of determining the longest length of the test strings in the array is the most 'efficient'. I have not benchmarked his versus mine but I have no doubt his will be an order of magnitude faster.

The really interesting example of brilliance is that regular expression. It uses a branch reset, and as you all probably know, a branch reset insures that any alternate defined within it that matches is captured to the same $n variable. There are two sets of naked parens in that regex within the alternation, and whichever matches will be saved to $1. Now here is a piece of brilliant regex coding that blows my mind. This alternation fragment:

  (?:(?:\d+\.){2,}\d+)()

is looking for invalid decimal expressions, such as IPs, as in 113.35.120.255, which are not floats but look like floats. These are to be excluded if present and because of that branch reset, the empty () saves nothing to $1. That effectively is the logical equivalent of a negative look-ahead, but certainly is faster and more efficient than using (?!...

I posted this question in hopes of generating discussion and I got more than I bargained for; an elegant lesson in Perl magic from a master. Thank you tybalt89 for a welcome dose of enlightenment.

Will

  • Comment on Re^2: Best practice validating numerics with regex?

Replies are listed 'Best First'.
Re^3: Best practice validating numerics with regex?
by dasgar (Priest) on Oct 17, 2023 at 22:45 UTC

    I probably shouldn't be responding because I don't fully understand tybalt89's code. I think I understand part of it, but not sure I understand it well enough to try to explain to anyone.

    In looking at the my $leftside line, I tried working my way from inside out.

    For the |., I found the documentation for Bitwise String Operators. It looks like the combination of use feature 'bitwise' and |. means that bitwise string OR operation was used. I don't fully understand bitwise string OR, but from that documentation it looks like the result is a string that has the same length as the longer of the two strings used in the operation. And in tybalt89's code, the resulting string is not important - only the length of it is important.

    The next level is the reduce function. I think I get the gist of what's happening, but not sure that I can explain it well. In the code, @floats is an AoA structure. I think that the reduce function here is being used with the bitwise string OR operator to find the longest length string of the first element of the second level array. (By second level array, I am referring to the level that has 'valid" and 'invalid' strings as the second element.

    After the reduce function does its work, then the length of the final resulting string is assigned to the $leftside variable. In the printf statement, the $leftside variable is used to create a right-justified 'field' where the $str variable (the first element of the second level of the @floats AoA data) is printed.

    I admit that I'm getting lost with the regex due to my low level skill/knowledge with regexes. Treating that as a black box and looking at the inputs and outputs, it seems like the regex is pulling out valid float number values from the $str variable to put into the @numbers array, which in turn is used in the printf statement.

    I probably didn't accurately describe things, but I tried to explain what I think I understand about tybalt89's code. Not sure if it helps you to gain a better understanding of the code or not.

Re^3: Best practice validating numerics with regex?
by tybalt89 (Monsignor) on Oct 21, 2023 at 13:47 UTC

    In the spirit of TIMTOWTDI here's a couple of ways the get the length of the longest string.

    #!/usr/bin/perl use strict; # https://www.perlmonks.org/?node_id=11155013 use warnings; use feature 'bitwise'; use List::AllUtils qw( reduce max ); $SIG{__WARN__} = sub { die @_ }; my $longest; my @strings = split ' ', <<END; one two three four five six seven eight nine ten END $longest = max map length, @strings; # maybe simplest print "longest: $longest\n"; $longest = max map y///c, @strings; # golf trick, one char shorter :) print "longest: $longest\n"; $longest = length reduce { $a |. $b } @strings; # or'ing strings print "longest: $longest\n"; $longest = reduce { max $a, length $b } 0, @strings; # internal max() print "longest: $longest\n"; # takes a lot of length()'s $longest = length reduce { length($a) >= length($b) ? $a : $b } @strin +gs; print "longest: $longest\n";

      Bitwise string assignment is another option.

      Both do the same thing. The second is just a bit more structurally similar to the other cases since it's on one line.

      my $str; $str |.= $_ for @strings; # more or'ing strings $longest = length $str; print "longest: $longest\n"; $longest = length do {my $s; $s |.= $_ for @strings; $s}; # more or'i +ng strings print "longest: $longest\n";

      Edit - fixed typo.