Greetings Monks, especially hippo :-)
I had no idea when I posted this question it would attract so many Monks, and what's more, so much of the discussion would be about grouping, and the inference I derive from that is that I offended Perl sensibilities by using () when it wasn't necessary. My bad, but that deserves a bit of discussion, especially when hippo included this expression in his rejoinder, to wit:
use 5.020; # for best efficiency on $&
I don't change my perl very often and I depend most heavily for documentation on 'Programming Perl', 3rd and 4th. In the early days I used whatever perl was installed on my employer's or client's machines. When I first put perl up on a personal machine it was 5.6.1, consistent with PP 3rd, and then 5.18 (close to PP 4th's 5.16). Later I jumped to 5.34 and now 5.36, but I still rely on PP, which advises against using any of $`, $& and $'. I used them only when debugging a difficult regex I was writing. Now I just use Regexp::Debugger. So hippo's pragma invocation sent me off on a search to find an explanation why so many Monks are using $&. Here is what I found in perlvar:
In Perl 5.20.0 a new copy-on-write system was enabled by default, which
finally fixes all performance issues with these three variables, and
makes them safe to use anywhere.
So, I now understand iff the notion you all are trying to convey to me is to use only as much syntax as is necessary to get the job done. I infer from that, that you would only use grouping if you were trying to pick two or more sub-expressions out of a string. If only one sub-expression is desired, then () are not needed and $& is spot on. But, TMTOWTDI, so I supplemented hippo's revision with my own code shown below, but that code is also worthy of comment, and discussion from y'all, if you care to comment.
I have long been fascinated by the commify problem I first encountered in Friedl, and I've spent many enjoyable hours trying to hack a regex that could commify reals that have fractional parts greater that 3 digits, to avoid 123,456.334,56. After grasping from the comments of several Monks that CODE in s/PATTERN/CODE/e is really a code block, or just another piece of perl code that can be eval-ed with /e, and the last expression evaluated is returned (except, of course, when using /r), I could see the awesome power of the notion of a regex within a regex, which was exactly the purpose of my original post, to get that usage right. That also explains the summed boolean results in my original post since the s/// regex itself is the last expression evaluated in that block; six iterations with \g, thus '6.56'. Now, please note my new commify solution as depicted in the following which solves the problem of not commify-ing the fraction part:
use strict;
use 5.020; # for best efficiency on $&
use Test::More tests => 11;
my $orig = 'This is a real number, 123456.56';
my $want = 'This is a real number, 493824.56';
(my $have = $orig) =~ s/\d+/sprintf "%i", 4 * $&/e;
my $intpart = $&;
is $intpart, 123456, 'int part captured';
is $have, $want, 'Multiply int part by 4';
($have = $intpart) =~ s/./3/g;
is $have, 333333, 'Int part digits set to "3" trivially, with /./';
($have = $intpart) =~ s/\d/3/g;
is $have, 333333, 'Int part digits set to "3" trivially, with /\\d/';
$want = 'This is a real number, 333333.56';
($have = $orig) =~ s/\d+/(my $int = $&) =~ s#.#3#g; $int/e;
is $have, $want, 'Replace int part in string with all 3s';
$want = 'This is a real number, 123,456.56';
($have = $orig) =~ s{ \d+ }{
$& =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex;
$intpart = $&;
is $intpart, 123456, '$& int part captured';
is $have, $want, 'Insert commas where appropriate';
($have = $orig) =~ s{ (\d+) }{
$1 =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex;
$intpart = $1;
is $intpart, 123456, '$1 int part captured';
is $have, $want, 'Insert commas where appropriate';
($have = $orig) =~ s{ (?<int> (\d+)) }{
$+{int} =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex;
$intpart = $+{int};
is $intpart, 123456, '<int> part captured';
is $have, $want, 'Insert commas where appropriate';
exit(0);
__END__
It also works with 123456.56567. Try it. So, one solution but use of grouping with $1 and $+{name}, but again, I concede that $& is right-on unless more that one sub-expression is targeted. Is that a fair rendering of the grouping issues in this dialogue? If not, I'm still a student of perl, since next to calculus, perl is one of the world's greatest inventions :-) Also, I am enough of a nerd to want to learn more about the changes wrought in 5.20, that copy-on-write system, that make $& safe to use. Please suggest a homework reading assignment? Thanks again for a very stimulating discussion. | [reply] [d/l] |
Greetings tybalt89
I wanted to see if I could make yours work with mine, and with a minor tweak, I did. I prefer to use assignment with s///, as in:
(my $tybalt89 = $str) =~ s/...
You did not; to each his own, but to make yours work in my preferred model I removed the /r from the outer regex expression. Once I did that I was able to replace my CODE statement with yours, as a drop-in. I teach math to undergrads with Perl, Python and R, and I have to be able to answer their questions, so I had to understand exactly what is going on here. I also added an extra digit, to see multiple ',' insertions and put the real number into a string. TMTOWTDI, works great, as in:
use 5.20.0;
use strict;
my $n = 4;
my $raw = (1234567.56567);
my $orig = (1234567.56567 * $n);
say "raw real number: $raw";
say "factored real nbr: $orig";
my $str = "This is a real number ${\($raw * $n)}.";
say $str;
(my $tybalt89 = $str) =~ s{ \d+ }{ $& =~ s/ \B (?= (?:\d{3})+ $ ) /,/x
+rg }xe;
say "tybalt89's with assignment and \\r tweak => \'$tybalt89\'";
(my $perlboy = (($raw) =~ s{ \d+\.?\d* }{ $& * 4 }erx )) =~ s{ \d+ }{
$& =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex;
say "perlboy's => \'$perlboy\'";
(my $mixed = (($raw) =~ s{ \d+\.?\d* }{ $& * 4 }erx )) =~ s{ \d+ }{
$& =~ s/ \B (?= (?:\d{3})+ $ ) /,/xrg }xe;
say "perboy's w/tybalt89's drop-in with \\r tweak => \'$mixed\'";
say "tybalt89's original regex, without assignment => ", ($str) =~ s{
+\d+ }{ $& =~ s/ \B (?= (?:\d{3})+ $ ) /,/xgr }xer;
say "tybalt89's original regex, without assignment => ", (1234567.5656
+7 * 4) =~ s{ \d+ }{ $& =~ s/ \B (?= (?:\d{3})+ $ ) /,/xgr }xer;
exit(0)
__END__
which when run yields:
raw real number: 1234567.56567
factored real nbr: 4938270.26268
This is a real number 4938270.26268.
tybalt89's with assignment and \r tweak => 'This is a real number 4,938,270.26268.'
perlboy's => '4,938,270.26268'
perboy's w/tybalt89's drop-in with \r tweak => '4,938,270.26268'
tybalt89's original regex, without assignment => This is a real number 4,938,270.26268.
tybalt89's original regex, without assignment => 4,938,270.26268
As I said, the only change to yours was to remove /r from the outer s///. To verify it would insert multiple ',', I added an extra digit and interpolated the real number into a string. So, that \B assertion works, and by removing two look-arounds, probably runs faster than mine.
UPDATE 9/17/2023
Running timethese(-10, {...));
Benchmark: running perlboy, tybalt89 for at least 10 CPU seconds...
perlboy: 10 wallclock secs (10.52 usr + 0.00 sys = 10.52 CPU) @ 25551.33/s (n=268800)
tybalt89: 11 wallclock secs (10.45 usr + 0.00 sys = 10.45 CPU) @ 27807.18/s (n=290585)
Running cmpthese(-10, {...});
Rate perlboy tybalt89
perlboy 25623/s -- -8%
tybalt89 27900/s 9% --
Cheers, and happy perl-ing | [reply] [d/l] |
Nice try, but that was not my use case. I only multiplied the int part to prove I could do it from the CODE part of s///. Now let's say that IS my use case, that I want to multiply AND commify WITHOUT grouping, and the real number MUST be in a string, as in hippo's test paradigm, to wit:
$orig = 'This is a real number, 123456.56567';
$want = 'This is a real number, 493,826.26268';
I can do it with mine but not yours, as in:
$want = 'This is a real number, 493,826.26268';
($have = (($orig) =~ s{ \d+\.?\d* }{ $& * 4 }erx )) =~ s{ \d+ }{
$& =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex;
$intpart = $&;
is $intpart, 493826, '$& int part captured';
is $have, $want, 'Insert commas where appropriate';
And as the Monks prefer, no grouping () to be seen. That is three s/// in a one-liner. First, grab the real number and factor it; next, grab just the int part and commify it; finally, put it back in the string. Man, s/PATTERN/CODE/e is potent mojo.
| [reply] [d/l] |