Re: regex in REPLACEMENT in s///

Replies are listed 'Best First'.
Re^2: regex in REPLACEMENT in s/// by choroba (Cardinal) on Sep 12, 2023 at 20:05 UTC
Just a nitpick: you only need parentheses when there's a capture, so you can remove them most of the times. `(my $adj = $1) =~ s/\d/3/g; # No parentheses needed. ($res = $str) =~ s{ (\d+) }{ ($adj = $1) =~ s/\d/3/g; $adj }xe; # Par +entheses only needed in the outer substitution.` [download] `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^3: regex in REPLACEMENT in s/// by hippo (Archbishop) on Sep 13, 2023 at 09:52 UTC
Indeed so. That would be just one of quite a few changes I would make to perlboy_emeritus's code if I were writing it myself. Instead, I tried to make the smallest number of changes to his original code to achieve what should be his desired outcome so that it is clearest that these are the changes needed to address the stated problem. If writing it myself but still keeping s/// as the operator under test it would probably look like this: use strict; use warnings; use 5.020; # for best efficiency on $& use Test::More tests => 4; my $orig = 'This is a real number, 123456.56'; my $want = 'This is a real number, 493824.56'; (my $have = $orig) =~ s/\d+/sprintf "%i", 4 * $&/e; my $intpart = $&; is $intpart, 123456, 'int part captured'; is $have, $want, 'Multiply int part by 4'; ($have = $intpart) =~ s/./3/g; is $have, 333333, 'Int part digits set to "3" trivially'; $want = 'This is a real number, 333333.56'; ($have = $orig) =~ s/\d+/(my $int = $&) =~ s#.#3#g; $int/e; is $have, $want, 'Replace int part in string with all 3s'; [download] Look, Ma - no capture groups! :-) 🦛	[reply] [d/l]
Re^4: regex in REPLACEMENT in s/// by perlboy_emeritus (Scribe) on Sep 14, 2023 at 16:16 UTC
Greetings Monks, especially hippo :-) I had no idea when I posted this question it would attract so many Monks, and what's more, so much of the discussion would be about grouping, and the inference I derive from that is that I offended Perl sensibilities by using () when it wasn't necessary. My bad, but that deserves a bit of discussion, especially when hippo included this expression in his rejoinder, to wit: use 5.020; # for best efficiency on $& I don't change my perl very often and I depend most heavily for documentation on 'Programming Perl', 3rd and 4th. In the early days I used whatever perl was installed on my employer's or client's machines. When I first put perl up on a personal machine it was 5.6.1, consistent with PP 3rd, and then 5.18 (close to PP 4th's 5.16). Later I jumped to 5.34 and now 5.36, but I still rely on PP, which advises against using any of $`, $& and $'. I used them only when debugging a difficult regex I was writing. Now I just use Regexp::Debugger. So hippo's pragma invocation sent me off on a search to find an explanation why so many Monks are using $&. Here is what I found in perlvar: In Perl 5.20.0 a new copy-on-write system was enabled by default, which finally fixes all performance issues with these three variables, and makes them safe to use anywhere. So, I now understand iff the notion you all are trying to convey to me is to use only as much syntax as is necessary to get the job done. I infer from that, that you would only use grouping if you were trying to pick two or more sub-expressions out of a string. If only one sub-expression is desired, then () are not needed and $& is spot on. But, TMTOWTDI, so I supplemented hippo's revision with my own code shown below, but that code is also worthy of comment, and discussion from y'all, if you care to comment. I have long been fascinated by the commify problem I first encountered in Friedl, and I've spent many enjoyable hours trying to hack a regex that could commify reals that have fractional parts greater that 3 digits, to avoid 123,456.334,56. After grasping from the comments of several Monks that CODE in s/PATTERN/CODE/e is really a code block, or just another piece of perl code that can be eval-ed with /e, and the last expression evaluated is returned (except, of course, when using /r), I could see the awesome power of the notion of a regex within a regex, which was exactly the purpose of my original post, to get that usage right. That also explains the summed boolean results in my original post since the s/// regex itself is the last expression evaluated in that block; six iterations with \g, thus '6.56'. Now, please note my new commify solution as depicted in the following which solves the problem of not commify-ing the fraction part: use strict; use 5.020; # for best efficiency on $& use Test::More tests => 11; my $orig = 'This is a real number, 123456.56'; my $want = 'This is a real number, 493824.56'; (my $have = $orig) =~ s/\d+/sprintf "%i", 4 * $&/e; my $intpart = $&; is $intpart, 123456, 'int part captured'; is $have, $want, 'Multiply int part by 4'; ($have = $intpart) =~ s/./3/g; is $have, 333333, 'Int part digits set to "3" trivially, with /./'; ($have = $intpart) =~ s/\d/3/g; is $have, 333333, 'Int part digits set to "3" trivially, with /\\d/'; $want = 'This is a real number, 333333.56'; ($have = $orig) =~ s/\d+/(my $int = $&) =~ s#.#3#g; $int/e; is $have, $want, 'Replace int part in string with all 3s'; $want = 'This is a real number, 123,456.56'; ($have = $orig) =~ s{ \d+ }{ $& =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex; $intpart = $&; is $intpart, 123456, '$& int part captured'; is $have, $want, 'Insert commas where appropriate'; ($have = $orig) =~ s{ (\d+) }{ $1 =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex; $intpart = $1; is $intpart, 123456, '$1 int part captured'; is $have, $want, 'Insert commas where appropriate'; ($have = $orig) =~ s{ (?<int> (\d+)) }{ $+{int} =~ s/ (?<=\d) (?= (?:\d{3} )+ (?!\d) ) /,/xrg; }ex; $intpart = $+{int}; is $intpart, 123456, '<int> part captured'; is $have, $want, 'Insert commas where appropriate'; exit(0); __END__ [download] It also works with 123456.56567. Try it. So, one solution but use of grouping with $1 and $+{name}, but again, I concede that $& is right-on unless more that one sub-expression is targeted. Is that a fair rendering of the grouping issues in this dialogue? If not, I'm still a student of perl, since next to calculus, perl is one of the world's greatest inventions :-) Also, I am enough of a nerd to want to learn more about the changes wrought in 5.20, that copy-on-write system, that make $& safe to use. Please suggest a homework reading assignment? Thanks again for a very stimulating discussion.	[reply] [d/l]
Re^5: regex in REPLACEMENT in s/// by tybalt89 (Monsignor) on Sep 14, 2023 at 21:27 UTC
Re^6: regex in REPLACEMENT in s/// by perlboy_emeritus (Scribe) on Sep 16, 2023 at 17:48 UTC
Re^6: regex in REPLACEMENT in s/// by perlboy_emeritus (Scribe) on Sep 15, 2023 at 06:40 UTC
Some notes below your chosen depth have not been shown here
Re^3: regex in REPLACEMENT in s/// by Bod (Parson) on Sep 12, 2023 at 21:57 UTC
you only need parentheses when there's a capture Are they not also needed where there is optionality? Could this be (sensibly) rewritten without the parenthesis? `$example =~ s/hiss(es)?/leak/;` [download] To substitute hiss/hisses for leak but not substitute hisse or hisss.	[reply] [d/l]
Re^4: regex in REPLACEMENT in s/// by haukex (Archbishop) on Sep 13, 2023 at 06:37 UTC
Could this be (sensibly) rewritten without the parenthesis? `$example =~ s/hiss(es)?/leak/;` Although tybalt89 probably can think of some ways, my answer would be: No, I would keep the parentheses in that case. However, they can be made non-capturing by writing `/hiss(?:es)?/` (or with the `/n` modifier, new since 5.22). I think choroba's point was that in `s/(\d)/3/g`, the parens serve no purpose at all.	[reply] [d/l] [select]
Re^5: regex in REPLACEMENT in s/// by Bod (Parson) on Sep 13, 2023 at 10:42 UTC
Re^2: regex in REPLACEMENT in s/// by perlboy_emeritus (Scribe) on Sep 12, 2023 at 20:19 UTC
Thank you hippo; tybalt89 also answered in like fashion, both spot on. Very much appreciated as I expect to make much use of this idiom with much more complex regex.	[reply]