in reply to Re: Best practice validating numerics with regex?
in thread Best practice validating numerics with regex?
Extraordinary! Even multiple float candidates in a string, the point made by hv. (/g). You've sent me off on a new tangent :-) However, I hate to admit this but I have never used bitwise operations on strings and so far, the net is not a good source of examples, and perldoc List::AllUtils is a source of frustration. I've been stepping through a subset of your code with debug to get a handle on ' |. ', which I think ORs two test strings together to get the longest for the print "%*s" width expression; 36 is definitely the longest in that array. If you would, please explain for someone naive with respect to bitwise string operations this expression?
my $leftside = length reduce {$a |. $b->[0]} @floats; # auto-adjust
I think, if the next string is longer than the previous, reduce pads the length for the next test, so the char values themselves, which are Unicode, aren't themselves relevant. When the iteration finishes $leftside contains the length of the longest Unicode string. I finally gave up noodling re what the significance of ORing two chars might be, other than non-ASCII Unicode characters can be multi-byte, and settled on the notion of building up the longest 'dingus' from the set of 'dinguses'. Is that the idea?
I would not have thought to use a bitwise operation to calculate the longest length of a set of strings though I guess that is one of the reasons why List::Util exists. My first thought was to use:
my $leftside = length reduce { length($a) >= length($b->[0]) ? $a : $b->[0] } @floats;
Or, if I got the purpose of that expression wrong, please point me to a reading assignment, other than perldoc List::AllUtils?
Thanks tybalt89 for a very interesting example.
Will
U P D A T E 10/18/2023
Thank you dasgar for insights. I used the term 'Extraordinary'; tybalt89 is brilliant, and of course, Perl is the eighth wonder of the known world. "use feature 'bitwise'" introduces |. which is useful for ORing strings, and 'bitwise' assures us that strings are treated as codepoints rather than graphenes. Why is this useful? Because length, sprintf and printf determine length attributes in codepoints, so just counting graphenes (the user visible notion of a character) can yield the wrong answer. Both length and reduce |. yield the same answer but working with bits is much faster. So, tybalt89's method of determining the longest length of the test strings in the array is the most 'efficient'. I have not benchmarked his versus mine but I have no doubt his will be an order of magnitude faster.
The really interesting example of brilliance is that regular expression. It uses a branch reset, and as you all probably know, a branch reset insures that any alternate defined within it that matches is captured to the same $n variable. There are two sets of naked parens in that regex within the alternation, and whichever matches will be saved to $1. Now here is a piece of brilliant regex coding that blows my mind. This alternation fragment:
(?:(?:\d+\.){2,}\d+)()
is looking for invalid decimal expressions, such as IPs, as in 113.35.120.255, which are not floats but look like floats. These are to be excluded if present and because of that branch reset, the empty () saves nothing to $1. That effectively is the logical equivalent of a negative look-ahead, but certainly is faster and more efficient than using (?!...
I posted this question in hopes of generating discussion and I got more than I bargained for; an elegant lesson in Perl magic from a master. Thank you tybalt89 for a welcome dose of enlightenment.
Will
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^3: Best practice validating numerics with regex?
by dasgar (Priest) on Oct 17, 2023 at 22:45 UTC | |
Re^3: Best practice validating numerics with regex?
by tybalt89 (Monsignor) on Oct 21, 2023 at 13:47 UTC | |
by swl (Prior) on Oct 21, 2023 at 21:43 UTC |