Re^3: regex in REPLACEMENT in s///

Replies are listed 'Best First'.
Re^4: regex in REPLACEMENT in s/// by perlboy_emeritus (Scribe) on Sep 14, 2023 at 16:16 UTC
Greetings Monks, especially hippo :-) I had no idea when I posted this question it would attract so many Monks, and what's more, so much of the discussion would be about grouping, and the inference I derive from that is that I offended Perl sensibilities by using () when it wasn't necessary. My bad, but that deserves a bit of discussion, especially when hippo included this expression in his rejoinder, to wit: use 5.020; # for best efficiency on $& I don't change my perl very often and I depend most heavily for documentation on 'Programming Perl', 3rd and 4th. In the early days I used whatever perl was installed on my employer's or client's machines. When I first put perl up on a personal machine it was 5.6.1, consistent with PP 3rd, and then 5.18 (close to PP 4th's 5.16). Later I jumped to 5.34 and now 5.36, but I still rely on PP, which advises against using any of $`, $& and $'. I used them only when debugging a difficult regex I was writing. Now I just use Regexp::Debugger. So hippo's pragma invocation sent me off on a search to find an explanation why so many Monks are using $&. Here is what I found in perlvar: In Perl 5.20.0 a new copy-on-write system was enabled by default, which finally fixes all performance issues with these three variables, and makes them safe to use anywhere. So, I now understand iff the notion you all are trying to convey to me is to use only as much syntax as is necessary to get the job done. I infer from that, that you would only use grouping if you were trying to pick two or more sub-expressions out of a string. If only one sub-expression is desired, then () are not needed and $& is spot on. But, TMTOWTDI, so I supplemented hippo's revision with my own code shown below, but that code is also worthy of comment, and discussion from y'all, if you care to comment. I have long been fascinated by the commify problem I first encountered in Friedl, and I've spent many enjoyable hours trying to hack a regex that could commify reals that have fractional parts greater that 3 digits, to avoid 123,456.334,56. After grasping from the comments of several Monks that CODE in s/PATTERN/CODE/e is really a code block, or just another piece of perl code that can be eval-ed with /e, and the last expression evaluated is returned (except, of course, when using /r), I could see the awesome power of the notion of a regex within a regex, which was exactly the purpose of my original post, to get that usage right. That also explains the summed boolean results in my original post since the s/// regex itself is the last expression evaluated in that block; six iterations with \g, thus '6.56'. Now, please note my new commify solution as depicted in the following which solves the problem of not commify-ing the fraction part: use strict; use 5.020; # for best efficiency on $& use Test::More tests => 11; my $orig = 'This is a real number, 123456.56'; my $want = 'This is a real number, 493824.56'; (my $have = $orig) =~ s/\d+/sprintf "%i", 4 * $&/e; my $intpart = $&; is $intpart, 123456, 'int part captured'; is $have, $want, 'Multiply int part by 4'; ($have = $intpart) =~ s/./3/g; is $have, 333333, 'Int part digits set to "3" trivially, with /./'; ($have = $intpart) =~ s/\d/3/g; is $have, 333333, 'Int part digits set to "3" trivially, with /\\d/'; $want = 'This is a real number, 333333.56'; ($have = $orig) =~ s/\d+/(my $int = $&) =~ s#.#3#g; $int/e; is $have, $want, 'Replace int part in string with all 3s'; $want = 'This is a real number, 123,456.56'; ($have = $orig) =~ s{ \d+ }{ $& =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex; $intpart = $&; is $intpart, 123456, '$& int part captured'; is $have, $want, 'Insert commas where appropriate'; ($have = $orig) =~ s{ (\d+) }{ $1 =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex; $intpart = $1; is $intpart, 123456, '$1 int part captured'; is $have, $want, 'Insert commas where appropriate'; ($have = $orig) =~ s{ (?<int> (\d+)) }{ $+{int} =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex; $intpart = $+{int}; is $intpart, 123456, '<int> part captured'; is $have, $want, 'Insert commas where appropriate'; exit(0); __END__ [download] It also works with 123456.56567. Try it. So, one solution but use of grouping with $1 and $+{name}, but again, I concede that $& is right-on unless more that one sub-expression is targeted. Is that a fair rendering of the grouping issues in this dialogue? If not, I'm still a student of perl, since next to calculus, perl is one of the world's greatest inventions :-) Also, I am enough of a nerd to want to learn more about the changes wrought in 5.20, that copy-on-write system, that make $& safe to use. Please suggest a homework reading assignment? Thanks again for a very stimulating discussion.	[reply] [d/l]
Re^5: regex in REPLACEMENT in s/// by tybalt89 (Monsignor) on Sep 14, 2023 at 21:27 UTC
Maybe slightly cleaner ? `(123456.56567 * 4) =~ s{ \d+ }{ $& =~ s/ \B (?= (?:\d{3})+ $ ) /,/xgr +}xer` [download] outputs `493,826.26268` [download]	[reply] [d/l] [select]
Re^6: regex in REPLACEMENT in s/// by perlboy_emeritus (Scribe) on Sep 16, 2023 at 17:48 UTC
Greetings tybalt89 I wanted to see if I could make yours work with mine, and with a minor tweak, I did. I prefer to use assignment with s///, as in: (my $tybalt89 = $str) =~ s/... You did not; to each his own, but to make yours work in my preferred model I removed the /r from the outer regex expression. Once I did that I was able to replace my CODE statement with yours, as a drop-in. I teach math to undergrads with Perl, Python and R, and I have to be able to answer their questions, so I had to understand exactly what is going on here. I also added an extra digit, to see multiple ',' insertions and put the real number into a string. TMTOWTDI, works great, as in: use 5.20.0; use strict; my $n = 4; my $raw = (1234567.56567); my $orig = (1234567.56567 * $n); say "raw real number: $raw"; say "factored real nbr: $orig"; my $str = "This is a real number ${\($raw * $n)}."; say $str; (my $tybalt89 = $str) =~ s{ \d+ }{ $& =~ s/ \B (?= (?:\d{3})+ $ ) /,/x +rg }xe; say "tybalt89's with assignment and \\r tweak => \'$tybalt89\'"; (my $perlboy = (($raw) =~ s{ \d+\.?\d* }{ $& * 4 }erx )) =~ s{ \d+ }{ $& =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex; say "perlboy's => \'$perlboy\'"; (my $mixed = (($raw) =~ s{ \d+\.?\d* }{ $& * 4 }erx )) =~ s{ \d+ }{ $& =~ s/ \B (?= (?:\d{3})+ $ ) /,/xrg }xe; say "perboy's w/tybalt89's drop-in with \\r tweak => \'$mixed\'"; say "tybalt89's original regex, without assignment => ", ($str) =~ s{ +\d+ }{ $& =~ s/ \B (?= (?:\d{3})+ $ ) /,/xgr }xer; say "tybalt89's original regex, without assignment => ", (1234567.5656 +7 * 4) =~ s{ \d+ }{ $& =~ s/ \B (?= (?:\d{3})+ $ ) /,/xgr }xer; exit(0) __END__ [download] which when run yields: raw real number: 1234567.56567 factored real nbr: 4938270.26268 This is a real number 4938270.26268. tybalt89's with assignment and \r tweak => 'This is a real number 4,938,270.26268.' perlboy's => '4,938,270.26268' perboy's w/tybalt89's drop-in with \r tweak => '4,938,270.26268' tybalt89's original regex, without assignment => This is a real number 4,938,270.26268. tybalt89's original regex, without assignment => 4,938,270.26268 As I said, the only change to yours was to remove /r from the outer s///. To verify it would insert multiple ',', I added an extra digit and interpolated the real number into a string. So, that \B assertion works, and by removing two look-arounds, probably runs faster than mine. UPDATE 9/17/2023 Running timethese(-10, {...)); Benchmark: running perlboy, tybalt89 for at least 10 CPU seconds... perlboy: 10 wallclock secs (10.52 usr + 0.00 sys = 10.52 CPU) @ 25551.33/s (n=268800) tybalt89: 11 wallclock secs (10.45 usr + 0.00 sys = 10.45 CPU) @ 27807.18/s (n=290585) Running cmpthese(-10, {...}); Rate perlboy tybalt89 perlboy 25623/s -- -8% tybalt89 27900/s 9% -- Cheers, and happy perl-ing	[reply] [d/l]
Re^6: regex in REPLACEMENT in s/// by perlboy_emeritus (Scribe) on Sep 15, 2023 at 06:40 UTC
Nice try, but that was not my use case. I only multiplied the int part to prove I could do it from the CODE part of s///. Now let's say that IS my use case, that I want to multiply AND commify WITHOUT grouping, and the real number MUST be in a string, as in hippo's test paradigm, to wit: $orig = 'This is a real number, 123456.56567'; $want = 'This is a real number, 493,826.26268'; I can do it with mine but not yours, as in: `$want = 'This is a real number, 493,826.26268'; ($have = (($orig) =~ s{ \d+\.?\d* }{ $& * 4 }erx )) =~ s{ \d+ }{ $& =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex; $intpart = $&; is $intpart, 493826, '$& int part captured'; is $have, $want, 'Insert commas where appropriate';` [download] And as the Monks prefer, no grouping () to be seen. That is three s/// in a one-liner. First, grab the real number and factor it; next, grab just the int part and commify it; finally, put it back in the string. Man, s/PATTERN/CODE/e is potent mojo.	[reply] [d/l]
Re^7: regex in REPLACEMENT in s/// by hippo (Archbishop) on Sep 15, 2023 at 09:08 UTC
Re^8: regex in REPLACEMENT in s/// by perlboy_emeritus (Scribe) on Sep 15, 2023 at 16:35 UTC