substitution in regular expression

aeqr has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: substitution in regular expression by AnomalousMonk (Archbishop) on Apr 23, 2014 at 19:59 UTC
If you just want to extract overlapping triplets without changing the original string: `c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'ABCDEF'; ;; my @triplets = $s =~ m{ (?= (...)) }xmsg; printf qq{'$_' } for @triplets; " 'ABC' 'BCD' 'CDE' 'DEF'` [download] If you want to simultaneously do substitutions to change the match string so that it ends up as `'DEF'` or `'EF'`, that's trickier (at least, it's tricky to do with a single substitution operation), but I'm assuming substitution is just an artifact of the potential approach you happened to come up with, i.e., it's an XY Problem. Please advise on this point. Update: See Re^3: substitution in regular expression for a string-modifying `s///` solution.	[reply] [d/l] [select]
Re^2: substitution in regular expression by aeqr (Novice) on Apr 23, 2014 at 20:06 UTC
Thanks for the help, it's ok to modify the string. I would like to do it in the way I described if it's possible.	[reply]
Re^3: substitution in regular expression by AnomalousMonk (Archbishop) on Apr 23, 2014 at 20:26 UTC
Maybe (?) something like this is what you want? `c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'ABCDEF'; ;; my @triplets; $s =~ s{ (?= (...)) . }{ push @triplets, $1; ''; }xmsge; ;; print qq{'$s'}; printf qq{'$_' } for @triplets; " 'EF' 'ABC' 'BCD' 'CDE' 'DEF'` [download] Update: Here's another `s///` solution. I'm not sure I like it so much: I'm always suspicious, perhaps without cause, of code embedded in a regex. In addition, the `@triplets` array must be a package global (ideally local-ized) due to a bug in lexical management that wasn't fixed until Perl version 5.16 or 5.18 (I think — I don't have access to these versions and I'm too lazy to check the on-line docs). `c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'ABCDEF'; ;; local our @triplets; $s =~ s{ (?= (...) (?{ push @triplets, $^N })) . }''xmsg; ;; print qq{'$s'}; printf qq{'$_' } for @triplets; " 'EF' 'ABC' 'BCD' 'CDE' 'DEF'` [download]	[reply] [d/l] [select]
Re^3: substitution in regular expression by AnomalousMonk (Archbishop) on Apr 23, 2014 at 20:09 UTC
So how should the string end up, as `'DEF'` or as '`EF'`?	[reply] [d/l] [select]
Re^4: substitution in regular expression by aeqr (Novice) on Apr 23, 2014 at 20:15 UTC
Re^5: substitution in regular expression by Anonymous Monk on Apr 23, 2014 at 20:21 UTC
Some notes below your chosen depth have not been shown here
Re: substitution in regular experssion by Anonymous Monk on Apr 23, 2014 at 19:54 UTC
You've placed code (`push(@dic,$1);`) inside the replacement part of the regular expression. Each three-letter code will be replaced by the string "`push(@dic,$1);`" instead of the code being executed, and because of that the string will never get shorter. Even though you could get the code to execute by adding the `/e` modifier on the regex, it still wouldn't do what you want (since the replacement value would be the return value of the `push` call), and so it's better to just move that code outside the regular expression. Since you're matching three letters with your regular expression, and you want to replace those with the last two of those three letters, it's easier to just write it that way: `while(length($line)>2){ $line =~ s/([A-Z]([A-Z]{2}))/$2/; push(@dic, $1); }` [download] I'm sure other monks will have (TI)MTOWTDI and more elegant solutions, but the above gets what you want with only a few changes.	[reply] [d/l] [select]
Re^2: substitution in regular experssion by aeqr (Novice) on Apr 23, 2014 at 20:04 UTC
Thanks for the info, I have tried your solution but it doesn't seem to work :/ Also I would like to know the way to do it without the while loop. That is, editing the string and saving as I have described...	[reply]
Re^3: substitution in regular experssion by AnomalousMonk (Archbishop) on Apr 23, 2014 at 20:13 UTC
... it doesn't seem to work :/ But what does that mean? In general, replies along the lines of "it doesn't work" are not helpful. How does it "not work"?	[reply]
Re^3: substitution in regular experssion by Anonymous Monk on Apr 23, 2014 at 20:14 UTC
It seems you've edited your node to remove the while loop you originally had. Please don't do that without marking your updates because it confuses things, now monks won't know which version of your question to answer. ... it doesn't seem to work In what way? Do you get an error, or are you seeing unexpected results? Because it works for me: `use Data::Dumper; print Dumper([build_dictionnary()]); sub build_dictionnary{ my $line="ABCDEF"; my @dic; while(length($line)>2){ $line =~ s/([A-Z]([A-Z]{2}))/$2/; push(@dic, $1); } return @dic; } # Output (whitespace compressed): # $VAR1 = [ 'ABC', 'BCD', 'CDE', 'DEF' ];` [download] I would like to know the way to do it without the while loop. Why?	[reply] [d/l]
Re^4: substitution in regular experssion by aeqr (Novice) on Apr 23, 2014 at 20:21 UTC
Re^5: substitution in regular experssion by AnomalousMonk (Archbishop) on Apr 23, 2014 at 20:34 UTC
Some notes below your chosen depth have not been shown here
Re^5: substitution in regular experssion by Anonymous Monk on Apr 23, 2014 at 20:30 UTC
Re: substitution in regular expression by Laurent_R (Canon) on Apr 23, 2014 at 21:37 UTC
Regex might not be the best way. And don't make mistakes on using loops or not. When you use the `s///g` operator (i.e. with the `g` modifier), you are in effect doing an implicit loop, even if it does not appear to be the case. Just as when you are using the grep or the map function, it may look as you are not looping on the source list or array, but you are just doing an implicit loop in that case (and the explicit loop of a for/foreach solution might often be actually slightly quicker). All this to introduce the fact that I will propose a rather concise solution with an explicit loop in the following Perl one-liner: `$ perl -le 'my $s = "ABCDEF"; print substr $s, $_, 3 for 0..length($s) +-3;' ABC BCD CDE DEF` [download] I did not check, but it is likely to be faster that any regex on large data input. Check it and tell your teacher about your findings on the various solutions, you might get an A+.	[reply] [d/l] [select]
Re^2: substitution in regular expression by aeqr (Novice) on Apr 23, 2014 at 21:54 UTC
Thanks for the additional idea and explanations. Good to see you have a sense of humor as well ;)	[reply]
Re: substitution in regular expression by trizen (Hermit) on Apr 23, 2014 at 22:15 UTC
One-line solution: `"ABCDEF" =~ /([A-Z]{3})(?{print "$1\n"})(?!)/;`	[reply] [d/l]
Re^2: substitution in regular expression by aeqr (Novice) on Apr 23, 2014 at 22:27 UTC
Thanks for the idea, I'll write it down. Just one thing, could you explain the: `(?{print "$1\n"})(?!)` [download] I don't understand the question mark before the print block. Also why the (?!) at the end. I noticed that removing it only prints ABC, but I don't understand why. Thank you	[reply] [d/l]
Re^3: substitution in regular expression by trizen (Hermit) on Apr 23, 2014 at 22:52 UTC
Short explanation: `(?{...})` means to execute arbitrary Perl code inside a regular expression, and `(?!)` makes the regex engine to fail and backtrack, trying to match from the `last_pos + 1`. When it starts matching `ABC`, it prints it, fails, backtracks and starts matching from `B` the next three letters, giving us `BCD`. The process repeats until the internal regex counter reaches the end of the string. I know, I'm really bad at explaining things to humans, but, fortunately, Athanasius explained this better once. Please see: Re: RegEx + vs. {1,}	[reply] [d/l] [select]
Re^4: substitution in regular expression by aeqr (Novice) on Apr 23, 2014 at 23:05 UTC